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We study a class of stochastic dynamic games that exhibit strategic complementarities between players; 
formally, in the games we consider, the payoff of a player has increasing differences between her own state 
and the empirical distribution of the states of other players. Such games can be used to model a diverse set 
of applications, including network security models, recommender systems, and dynamic search in markets. 
Stochastic games are generally difficult to analyze, and these difficulties are only exacerbated when the 
number of players is large (as might be the case in the preceding examples). 

We consider an approximation methodology called mean field equilibrium to study these games. In such 
an equilibrium, each player reacts to only the long run average state of other players. We find necessary 
conditions for the existence of a mean field equilibrium in such games. Furthermore, as a simple consequence 
of this existence theorem, we obtain several natural monotonicity properties. We show that there exist a 
"largest" and a "smallest" equilibrium among all those where the equilibrium strategy used by a player is 
nondecreasing, and we also show that players converge to each of these equilibria via natural myopic learning 
dynamics; as we argue, these dynamics are more reasonable than the standard best response dynamics. We 
also provide sensitivity results, where we quantify how the equilibria of such games move in response to 
changes in parameters of the game (e.g., the introduction of incentives to players). 



1. Introduction 

This paper studies a class of games that exhibit strategic complementarities between players. A 
strategic complementarity exists if, informally, "higher" actions by other players increase the return 
to higher actions for a given player. Games with strategic complementarities are a powerful mod- 
eling tool, applicable in a wide range of situations, including: systems with positive network effects 
(such as network security models, recommender systems, and social networks); coordination prob- 
lems; dynamic search in markets; social learning; and oligopoly models (e.g., quantity or price 
competition with complementarities). 

Our focus in this paper is on dynamic games with strategic complementarities. Strategic com- 
plementarities have long provided a fertile analytical ground for static game theoretic models; see, 
e.g., Milgrom and Roberts (1990), Vives (1990), and Topkis (1998). However, the literature on 
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dynamic games with complementarities has emerged relatively recently by comparison. Much of the 
attention in prior work on such games has focused on developing existence proofs for equilibrium; 
see, e.g., Curtat (1996), Amir (2002, 2005), Sleet (2001), Vives (2009) for these results. 

In this paper we consider a class of dynamic games referred to as stochastic games; in these games 
agents' actions directly affect underlying state variables that influence their payoff (Shapley 1953). 
The standard solution concept for stochastic games is Markov perfect equilibrium (Fudenberg and 
Tirole 1991). Despite the previously cited existence results for Markov perfect equilibria in games 
with complementarities, there remain two significant obstacles, particularly as the number of play- 
ers grows large. First is computability: the state space of the preceding games expands in dimension 
with the number of players, and thus the "curse of dimensionality" kicks in, making computation 
of Markov perfect equilibria essentially infeasible (Pakes and McGuire 2001, Doraszelski and Pakes 
2007). Second is plausibility: as the number of players grows large, it becomes increasingly difficult 
to believe that individual players track the exact behavior of the other agents. Rather than treat 
the growth of the population as an impediment to analysis, this paper addresses these obstacles 
by exploiting an asymptotic regime where the number of players grows large to simplify analysis 
of equilibria. 

We consider an approximation methodology where agents optimize only with respect to long 
run average estimates of the distribution of other players' states, that we refer to as mean field 
equilibrium; this notion has been utilized across a range of work in economics, operations research, 
and control (as we discuss below). In a mean field equilibrium, individuals take a simpler view of 
the world: they postulate that fluctuations in the empirical distribution of other players' states 
have "averaged out" due to large scale, and thus optimize holding the state distribution of other 
players fixed. Mean field equilibrium requires a consistency check: the postulated state distribution 
must arise from the optimal strategies agents compute. 

Our results provide valuable insight into the structure of mean field equilibria in such games, as 
well as computational tools to determine such equilibria. To motivate our results, we first provide 
several examples of stochastic games with complementarities where the approach taken in this 
paper applies. These examples — particularly the first four — often exhibit large numbers of players, 
and thus the benefits of mean field equilibrium are significant. We demonstrate in Section 8 that 
each of these examples can be analyzed using the results we develop in this paper. 

Example 1 (Interdependent security). In interdependent security games, as introduced in 
Kunreuther and Heal (2003), a large number of agents make individual decisions about their own 
security. However, the ultimate security of an agent depends on the security decisions made by 
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other agents. For example, imagine a network of computers where each individual user makes an 
investment in keeping her own machine secure. This investment may be in the form of advanced 
anti-virus filters, firewalls, etc. While these investments improve the security of the individual 
computer, it can still be affected if the other computers in the network are not properly secured. In 
the interdependent security games we consider, agents take actions at some cost to improve their 
own security level, and earn a payoff each period that depends on whether or not a security breach 
occurs. The fact that the probability of a security breach is influenced by others' security levels 
introduces strategic complementarities into the stochastic game. □ 

Example 2 (Collaborative filtering). Many large online recommendation systems, such as 
those used by Netflix and Amazon, rely on collaborative filtering. In such systems, if an individ- 
ual puts forth greater effort in maintaining their profile, the recommendations they receive will 
improve. However, the recommendations other individuals receive improve as well, and typically 
other individuals will feel a stronger incentive to exert additional effort to maintain their pro- 
file in this case. In the absence of such effort, the profile of an agent becomes stale and useless 
both to her and others in the system. Thus collaborative filtering systems exhibit strong strategic 
complementarities. □ 

Example 3 (Dynamic search with learning). In dynamic search models, traders in a market 
exert effort to find trading partners (Diamond 1982). Such models are commonly used to study, 
e.g., decentralized matching in labor markets. We consider a model where at each time step, traders 
also gain experience by exerting effort; this experience makes future effort more productive. Of 
course traders' experience increases as they put forth more effort; but their experience also increases 
as others put forth more effort since this increases the likelihood of useful interactions per unit 
effort. This creates strategic complementarities between the players; such a model was considered 
by Curtat (1996). □ 

Example 4 (Coordination games). There exist many examples in operations and economics 
where agents are trying to coordinate on a common goal; for example, this is the case when firms 
try to coordinate on a common standard. In a coordination game, a collection of agents take 
individual actions to converge on a common state. One such stylized model is the linear-quadratic 
decentralized coordination problem studied by Huang et al. (2005). Agents can change their state 
by exerting effort at some cost. Further, each agent incurs an additional state-dependent cost each 
time period; this cost is quadratic in the distance to the average of other agents' states. This type 
of game can be shown to exhibit strategic complementarities between agents. □ 
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Example 5 (Oligopolies and complementary goods). Consider competition among firms 
producing complementary goods. In particular, suppose firms have effective monopolies in their own 
markets, but that their goods are complements, so that the consumption of one good will increase 
the demand and consumption of others. Such models naturally exhibit strategic complementarities. 

One potential issue in using mean field models to analyze oligopolies with complementary goods 
is that the number of firms may not be too large, thus raising questions about the validity of a 
mean field limit in the first place. However, even in such a setting mean field models have value, 
because they provide structural insight into optimal strategies under a model of rationality that is 
perhaps more plausible, as discussed above. Indeed, econometric analysis using mean field models 
of dynamic oligopolies has proven valuable for a range of industries with relatively small numbers 
of firms (see Weintraub et al. (2010) for examples). □ 

Our main results provide conditions that ensure existence of mean field equilibria in stochas- 
tic games with complementarities. We also establish that simple learning procedures converge to 
equilibria, and provide insight into sensitivity of equilibria to parameter changes. We consider a 
general class of models with parsimonious assumptions over model primitives that ensure strategic 
complementarities. In particular, our model class allows players to be coupled both via their payoff 
function and state transitions, i.e., players' payoffs and state transitions can depend on states or 
actions of other players. We also discuss extensions of our results to models with multidimensional 
state and action spaces, and with heterogeneity among players. Details of our results follow. 

1. Structural characterization of mean field equilibrium. We establish existence of a mean field 
equilibrium in a general stochastic game model using lattice theoretic techniques. Lattice theoretic 
methods are typically applied in games with complementarities; the key techniques we use are due 
to Tarski (1955), Kamae et al. (1977), Hopenhayn and Prescott (1992), Zhou (1994), and Topkis 
(1998). Despite the use of lattice theoretic techniques in our analysis, existence of equilibria in our 
game cannot be inferred from existence results for other games in the literature. Moreover, we show 
that there exists a "largest" and "smallest" equilibrium among the set of all mean field equilibria 
with nondecreasing strategies. Thus, in particular, there is a natural dominance relationship among 
the mean field equilibria of a given stochastic game with complementarities. This is particularly 
valuable in dynamic games, because our characterization applies to the distribution of states of 
agents in equilibrium. 

We note that prior literature has established existence of equilibrium in stochastic games with 
complementarities; however, these results typically also require use of topological fixed-point the- 
orems such as Kakutani's theorem (Curtat 1996, Amir 2002, 2005). More closely related to our 
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paper is the work of Sleet (2001), who considers mean field equilibria of a dynamic price-setting 
game with stochastic, exogenous firm-specific demand shocks per period. The general analytical 
techniques in this paper can be applied to recover the existence result for that game. 

2. Convergence to equilibrium. We provide two convergence results. First, we study a standard 
best response dynamic (BRD). In this algorithm, at each time step, each agent computes the 
stationary population state distribution that would be induced by the current strategies of others, 
and in turn computes the best response to that state distribution. Using monotonicity properties 
derived in establishing the existence of mean field equilibrium, we show that BRD converges. 

However, BRD is unsatisfying both computationally and practically. From a computational 
standpoint, BRD requires computation of a stationary distribution given the current strategy 
choices of agents in the system; this is in principle a complex procedure to execute at each iteration. 
More importantly, BRD is an implausible approach to play in an actual game: it is unlikely that 
agents would explicitly compute the stationary distribution their competitors would obtain. 

Instead, we consider a more a natural form of myopic learning dynamics (MLD) among the 
players; convergence of MLD is a central insight of our paper. In particular, suppose that initially, 
each agent starts at the lowest (resp., highest) possible state. At each time step, agents observe the 
current empirical population state distribution, and conjecture that this distribution will remain 
constant for all time; with this conjecture they compute an optimal strategy, and play in the 
next period according to that strategy. At the next time step, the state distribution will evolve, 
and agents repeat the same heuristic. We show that this dynamic converges to the lowest (resp., 
highest) mean field equilibrium among all equilibria with nondecreasing strategies. 

Note that MLD resolves both the computability and plausibility issues raised above. First, it is 
a natural, simple, implementable algorithm for finding a mean field equilibrium; indeed, MLD has 
some similarities with model predictive control or receding horizon control (Garcia et al. 1989), both 
popular approaches to complex dynamic control problems. Second, it corresponds to a learning 
dynamic that demands only a weak form of rationality and forecasting from the players, and yet 
yields an equilibrium in the limit. 

3. Separable stochastic games. Although appealing, the general theory does pose some signifi- 
cant issues in application: the complementarity requirements on model primitives may preclude 
important and interesting cases of practical interest. Complementarity is a strong requirement, but 
also brittle: a model that does not appear to satisfy the assumptions a priori may do so through 
a judicious change of variables. We employ this fact to show that a range of games that do not 
satisfy the assumptions of our baseline model can be studied by a suitable change of variables, 
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provided that the payoff is separable in the state and action of a given player — often a relatively 
mild assumption. Notably, models with linear dynamics fall in this class. This greatly expands the 
set of models that can be analyzed within our framework. 

4. Sensitivity. Finally, essentially for free, the complementarity structure allows us to analyze 
changes in the equilibrium in response to changes in parameters of the game. In particular, we 
can predict shifts (in a first order stochastic dominance sense) of the equilibrium state distribution 
of players in response to exogenous parameter changes. Such sensitivity analysis, or comparative 
statics, allows our model to address, e.g., the value of incentives to increase security levels, or the 
value of increasing the quality of recommendations by a given factor. 

The remainder of the paper is organized as follows. In Section 2 we introduce our basic stochastic 
game model as well as the formal definition of mean field equilibrium. Notably, we also discuss a 
justification for the use of mean field equilibrium: that it approximates equilibria of finite games 
well. This approximation property has been developed in a variety of specific contexts in the past 
(see, e.g., Glynn 2004, Huang et al. 2006, Weintraub et al. 2008, and Tembine et al. 2009), and in 
our context we apply the methodology developed in Adlakha et al. (2010) (inspired by Weintraub 
et al. 2008) to justify mean field equilibrium as a limiting notion of equilibrium. 

Next, in Section 3, we define stochastic games with complementarities. We then prove our first 
main result: that a mean field equilibrium exists for a stochastic game with complementarities. In 
Section 3.2, we show that equilibria are "ordered," in the sense that there exists a smallest and 
largest mean field equilibrium among all those where the equilibrium strategy is nondecreasing. In 
Section 4, we prove convergence of both the BRD and MLD algorithms described above. We also 
discuss the performance of MLD in finite systems. 

In Section 5, we provide comparative statics results for the games under consideration. In Sec- 
tion 6, we extend our results to cover games where players' payoffs and transition kernels may 
depend on the actions of others, rather than their states. In Section 7 we consider separable stochas- 
tic games with complementarities (as described above), and establish that these are a special case 
of our basic model of stochastic games with complementarities. 

In Section 8, we revisit each of the examples described above. In particular, we provide formal 
verification that these examples satisfy the assumptions made in the paper to obtain existence 
and convergence results. Finally, in Section 9, we study a particular instance of an interdependent 
security game. We use this game to illustrate several computational insights, including verifica- 
tion of comparative statics results, as well as exploration of the performance of the MLD dynamic 
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described above. Section 10 concludes with a discussion of extensions to include both player het- 
erogeneity (i.e., type information) and multidimensional state and/or action spaces. 

We conclude by surveying related work on mean field equilibrium. The notion of mean field 
equilibrium is inspired by mean field models in physics, where large systems exhibit macroscopic 
behavior that is considerably more tractable than their microscopic description. (See, e.g., Mezard 
and Montanari (2009) for background, and Blume (1993) and Morris (2000) for related ideas applied 
to static games.) In the context of stochastic games, mean field equilibrium and related approaches 
have been proposed under a variety of monikers across economics and engineering; see, e.g., stud- 
ies of anonymous sequential games (Jovanovic and Rosenthal 1988, Bergin and Bernhardt 1995); 
stationary equilibrium (Hopenhayn 1992); dynamic stochastic general equilibrium in macroeco- 
nomic modeling (Stokey et al. 1989); Nash certainty equivalent control (Huang et al. 2006, 2007); 
mean field games (Lasry and Lions 2007); oblivious equilibrium (Weintraub et al. 2008, 2010); and 
dynamic user equilibrium (Friesz et al. 1993, Wunderlich et al. 2000). Mean field equilibrium has 
also been studied in recent works on information percolation models (Duffie et al. 2009), sensitivity 
analysis in aggregate games (Acemoglu and Jensen 2009), coupling of oscillators (Yin et al. 2010), 
and in scaling behavior of markets (Bodoh-Creed 2010). 

2. Model and Definitions 

In this section we begin with preliminaries. We define a general model of a stochastic game in 
Section 2.1; in the games we consider, agents take actions to update their own states, and their 
payoffs and state transitions may be affected by the states of others. Next, in Section 2.2, we 
define mean field equilibrium, and in Section 2.3 we provide a formal justification for mean field 
equilibrium as an approximation to equilibria in games with a large finite number of players. 
Finally, in Section 2.4, we discuss lattice-theoretic preliminaries necessary for the analysis in the 
sequel. 

2.1. Stochastic Games 

We consider a game played among m players. A stochastic game is a tuple T = (X , A, P, n, (3) defined 
as follows. 

Time. The game is played in discrete time, with time periods by t = 0, 1, 2, . . .. 
State. The state of player i at time t is denoted by x ijt £ X, where X C R is compact. We use ac_j t 
to denote the state of all players except player % at time t. 
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Action. The action taken by player i at time t is denoted by a ijt . The set of feasible actions when 
the player is in state x is a compact set A(x) C R. We let A = U xGX A(x), and assume that A is 
compact as well. 

Transition probabilities. The state of a player evolves according to the following Markov process. 
If the state of player i at time t is x^ t = x, the player takes an action a ijt = 06 A(x) at time t, and 
the state of every other player at time t is x_ iit = y, then the next state is distributed according 
to the Borel probability measure F(-\x,a,y), where for Borel sets S C X , 

f(S\x, a, y) = Prob (x ijt+1 € S\x i>t = x, a* jt = a, x_,, t = y) . (1) 

Further, given x ift , a ift , and x_ ijt , the next state x^t+i is conditionally independent of all other 
past history of the game. 

Payoff. The single period payoff to player i at time t is ir(xi tt , a^t, x~i,t) € R. Note that all players 
have the same payoff, and it is independent of the actions taken by other players. 

Discount factor. The players discount their future payoff by a discount factor < (3 < 1 . Thus a 

player i's infinite horizon payoff is given by: 

00 

^ ' /3*7r(iBi,t, Oj,t, x_ i t ). 
t=0 



It may initially appear unusual that we do not include the number of players as part of the 
specification of the game; however, this choice is deliberate. We ultimately study T in a limiting 
regime where the number of players grows large, and as a result, mean field equilibrium is defined 
without regard to a fixed finite number of players. For this reason we do not include m in the tuple 
defining V. (See the next section for further discussion of the motivation for mean field equilibrium.) 

In the model described above, the players are coupled to each other via their state evolution 
and the payoff function. In a variety of games, this coupling between players is independent of the 
identity of the players. The notion of anonymity captures scenarios where the interaction between 
players is via aggregate information about the state. Let f^™\(y) denote the fraction of players 
(excluding player i) that have their state as y at time t, i.e.: 

/-t(y) = ^£w y }> ( 2 ) 

where l{ Xjt=y } is the indicator function that the state of player j at time t is y. We refer to f^ t 
as the population state at time t (from player i's point of view). 
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Definition 1 (Anonymous Stochastic Game). A stochastic game r = (X,A, P,vr,/3) is called 
an anonymous stochastic game if the transition probability measure and payoff for player i depend 
on x_ itt only through Through an abuse of notation, we write the transition probability 

measure as P(-|;Ej it , aj )t , f^™ t ) and the payoff function of player i as 7r(;Ej it , a^t, /_f t ). 

The examples discussed in the Introduction naturally belong to the class of anonymous stochastic 
games. For example, in the interdependent security model (Example 1), it is natural to assume that 
a single player's payoff is affected by the empirical distribution of security levels of other players in 
the network, but not by their specific identity. The same assumption is also plausible for the other 
examples presented earlier. 

For the remainder of the paper, we focus our attention on anonymous stochastic games. For ease 
of notation, we often drop the subscript % and t to denote a generic transition probability measure 
and a generic payoff, i.e., we denote a generic transition probability measure by P(-|x,a, /) and 
a generic payoff by ir(x,a,f), where / represents the population state of players other than the 
player under consideration. We let $ denote the set of all Borel probability measures on X . 

2.2. Mean Field Equilibrium 

In a game with a large number of players, we might expect that fluctuations of players' states 
"average out" , and hence the actual population state remains roughly constant over time. Because 
the effect of other players on a single player's payoff is only via the population state, it is intuitive 
that, as the number of players increases, a single player has negligible effect on the outcome of 
the game. This intuition is formalized through the notion of mean field equilibrium (Jovanovic and 
Rosenthal 1988, Bergin and Bernhardt 1995, Hopenhayn 1992, Stokey et al. 1989, Friesz et al. 
1993, Huang et al. 2006, 2007, Lasry and Lions 2007, Weintraub et al. 2008, 2010, Adlakha et al. 
2010, Bodoh-Creed 2010). 

In mean field equilibrium, each player optimizes its payoff based on only the long-run average 
population state. Thus, rather than keep track of the exact population state, a single player's action 
depends only on her own current state as well as the long run average population state. This is 
motivated by the fact that a single player need not concern herself with the fine scale dynamics 
of competitors' specific states. Given this simplified player behavior, note that each player must 
solve a dynamic program to determine their optimal strategy; the strategy chosen by each player 
then leads to a long-run average population state. Mean field equilibrium requires that the latter 
long-run average population state matches the original conjecture made by the players. 
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Note that in a mean field equilibrium, because players optimize holding the population state 
constant, their optimal strategies will depend only on their current state. We call such players 
oblivious, and refer to their strategies as oblivious strategies. This approach does not require players 
to be aware of each others' exact states, if every player is aware of the long-run average population 
state. Furthermore, observe that if all players are oblivious, players' states evolve independently. 

In this section we fix an anonymous stochastic game T = (X,A,¥,tt,/3). Formally, an oblivious 
strategy is a strategy that depends only on the player's current state. We let 9Jlo denote the set of 
oblivious strategies. 

Definition 2. Let 9JT be the set of oblivious strategies available to a player: 

m ={(J,:X^A | (i(x) G A{x) for all x £ X} . (3) 

Given an oblivious strategy /i G 9?to> a player i takes an action a iit = //(x iit ) at time t. If the 
player conjectures the aggregate population state to be /, then she also conjectures that her next 
state is randomly distributed according to the transition probability measure P: 

Xi, t+ i ~P(-|x i!t ,/i(x ii4 ),/), (4) 

where / is the conjectured long run average population state. 

We define the oblivious value function V(x\fi, /) to be the expected net present value for any 
player with initial state x, when the long run average population state is conjectured to be /, and 
the player uses an oblivious strategy /j,. We have 



V(x\fi,f)^E 



Xq — X] \1 



(5) 



_t=o 

Given a population state /, a player computes an optimal strategy by maximizing their oblivious 
value function. Note that because the oblivious value function does not track the evolution of the 
population state, we should expect a player's optimal strategy to depend only on their current 
state — i.e., it must be oblivious. We capture this optimization step via the operator V defined next. 

Define V*(x\f) as: 

V*(x\f)= sup V{x\y!,f). 

Definition 3. The operator V : 5" — > 9Jlo maps a distribution / G J to the set of optimal oblivious 
strategies. That is, \i G V(f) if and only if 

V(x\n,f) = V*(x\f), VxG*. 
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Note that in principle, V(f) may be empty, though we show that under our assumptions this does 
not occur. 

Now suppose that all players use the oblivious strategy /x, and the long run average population 
state / drives their state dynamics. In this scenario, we expect the long run population state to be 
an invariant distribution of the strategy fi under the dynamics 



We capture this relationship via the operator T>, defined next. 

Definition 4. The operator T> : 9Jt xJ-^J maps an oblivious strategy fx and a distribution / 
to the set of invariant distributions associated with the dynamics (6). 

Thus, g € T>(fi, /) if and only for all Borel sets S C X, 



Note that the image of the operator T> is empty if the strategy does not result in an invariant 
distribution, though again, we show under our assumptions that this does not occur. 

We can now define mean field equilibrium. If every agent conjectures that / is the long run 
population state, then every agent would prefer to play an optimal oblivious strategy [i. On the 
other hand, if every agent plays and the long run population state is indeed /, then / must also 
be an invariant distribution of (6). Thus mean field equilibrium requires a consistency condition: 
the invariant distribution under \i and / should be exactly /. 

Definition 5 (Mean Field Equilibrium). A strategy fi and a distribution / constitute a mean 
field equilibrium if fj, € V(f) and / E T>(fi, f). 

2.3. The Approximate Markov Equilibrium Property 

A natural question that arises in the context of mean field equilibrium is whether it is a good 
approximation to a game with finitely many players. Here we present a formal justification for the 
notion of mean field equilibrium by considering explicitly a limiting regime where the number of 
players grows large. 

Recall that we defined T initially as a stochastic game with m players. The standard solution 
concept for stochastic games is Markov perfect equilibrium. In a Markov perfect equilibrium, players' 
strategies depend on their own current state, as well as the current states of others; we refer to 
such strategies as cognizant strategies. This larger state space makes Markov perfect equilibrium a 



x t+1 ~F(-\x t ,li(x t ),f) . 



(6) 




(7) 
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much more complex equilibrium concept: Markov perfect equilibrium is typically quite challenging 
to compute, and demands far greater rationality on the part of the players. 

It can be shown, however, that under appropriate assumptions, a mean field equilibrium is 
approximately a Markov perfect equilibrium as the number of players grows large. Formally, let 
(/i, f) be a mean field equilibrium, and fix a single player i. Suppose that we consider a sequence of 
games with m — > oo, where all players other than player i use the oblivious strategy /x; and where 
the initial state of all players other than player i is sampled i.i.d. from /. Then we can show that as 
m — > oo, the difference between the payoff player i achieves by playing fj, and the maximum possible 
payoff player i can achieve by playing any cognizant strategy approaches zero almost surely, for all 
initial states x of player i. Thus, in particular, fi is approximately optimal for player i in a large 
finite game. A weaker version of this property, called the approximate Markov equilibrium property, 
was introduced by Weintraub et al. (2008); a similar notion is also studied by Glynn (2004), Huang 
et al. (2005), Tembine et al. (2009) and Bodoh-Creed (2010). 

In order for this approximation property to hold, the key requirement is that the model primitives 
Tr(x,a, f) and P(-\x,a, /) must be jointly continuous in a and /, and the payoff function must be 
uniformly bounded. The intuition is that, essentially, the desired approximation property amounts 
to a continuity property in the value function of a player. We refer the reader to our companion 
paper Adlakha et al. (2010) for details of this type of result in the case of discrete state spaces. 
Independently of our own work, Bodoh-Creed (2010) has also derived similar conditions to ensure 
that mean field equilibrium approximates Markov perfect equilibrium well, over compact continuous 
state spaces. 

For the remainder of the paper, we only study stochastic games T in the limiting regime where 
the number of players grows large. In particular, we focus on existence of, and convergence to, 
mean field equilibrium. In Section 3, we establish that a mean field equilibrium always exists for 
stochastic games with complementarities. 

2.4. Lattice-Theoretic Preliminaries 

This section contains an overview of some basic definitions and notation used in the remainder 
of the paper. Our development requires some basic concepts from the theory of lattices. Given a 
partially ordered set X with order >z, an element x is called an upper bound of S if x >z y for all 
yS5; similarly, x is called a lower bound of S if y >z x for all y € S. We say that x is a supremum 
or least upper bound of S in X if x is an upper bound of S, and for any other upper bound x' 
of S, we have x' y x. In this case we write x = supS 1 . We similarly define infimum (or greatest 
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lower bound), and denote it by infS". The partially ordered set (X, >z) is a lattice if for all pairs 
x,y G X, the elements sup{x,y} and inf{x,y} exist in X . The lattice (X, >z) is a complete lattice if 
in addition, for all nonempty subsets S C X, the elements sup S and inf S exist in X. 

If X is a lattice, a function / : X — > R is super-modular if /(sup{x,x'}) + /(inf{x, x'}) > /(x) + 
/(x') for every pair x, x' . If Y is also a lattice, a function / : X x Y — s> R has increasing differences 
in x and y if for all x' >: x, y' >z y, there holds f(x',y') — f(x',y) > f(x,y') — f(x,y). Finally, a 
correspondence T : X — > Y is nondecreasing if whenever x' ^ x, y G T(x), and y' G T(x'), there holds 
sup{y,y'} G T(x'), and inf{y,y'} G 7"(x). (For more detail on lattice programming, the reader is 
referred to Topkis (1998).) 

Throughout this paper, we view X and A as lattices in the usual ordering; since these spaces are 
both compact subsets of R, the corresponding lattices are complete (Topkis 1998). We also view 
the set of strategies DJl as a lattice, under the coordinate ordering >: i.e., // > fj, if and only if 
H'(x) > /i(x) for all x. 

In addition, recall that we let 5 denote the set of all Borel probability measures on X. We view 
$ as a lattice with the (first order) stochastic dominance ordering; formally, we write /' ^sd / if 
and only if: 



for all nondecreasing, bounded, measurable functions g on X (where the integral is the Riemann- 
Stieltjes integral). It is straightforward to show that this condition is equivalent to F(x) < F'(x), 
where F (resp., F') is the cumulative distribution function of / (resp., /'). It is well known that 5 
is a lattice: the lattice supremum sup SD {/, /'} (resp., the lattice infimum inf S o{/, /'}) is found by 
the pointwise infimum (resp., supremum) of the corresponding distribution functions. Because X 
is compact, it is straightforward using an analogous argument to verify that 5 is a complete lattice 
(Echenique 2003). 

We conclude by defining some properties of parameterized distributions we require in the sequel. 
Let /(-|y) denote a family of measures in parameterized by y G Y, where Y is a lattice. Then 
we say / is stochastically nondecreasing in y if whenever y' is larger than y, f(-\y') <^sd f('\y)- 
Similarly, let f(-\y,z) denote a family of measures in J parameterized by y G Y and z G Z, where 
both Y and Z are lattices. Then we say that / has stochastically increasing differences in y and z if 
the expectation J x g(x)f(dx\y,z) has increasing differences in y and z, for every nondecreasing, 
bounded, measurable function g on X. 
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3. Existence of Mean Field Equilibria 

In this section and the following section, we consider a baseline model of stochastic games with 
complementarities, in which we prove existence and convergence results. In this section we establish 
our first main result: that there exists a mean field equilibrium for the stochastic game with 
complementarities. We also show an ordering result: there exists a "largest" and a "smallest" 
equilibrium among the set of all mean field equilibria with nondecreasing strategies. 
We have the following definition. 

Definition 6. A stochastic game with complementarities is a stochastic game T = (X, A, F, ir, f3) 
that satisfies the following properties. 

1. Nondecreasing and supermodular payoff. The payoff Tr(x,a,f) is nondecreasing in x, continu- 
ous in a, and supermodular in (x,a). Furthermore, for fixed a and /, sup xeX ir(x,a, f) < oo. 

2. Payoff complementarity. The payoff function n(x,a,f) has increasing differences in (x,a) 
and /. 

3. Monotone and supermodular transition kernel. The transition kernel P(-\x,a, f) is stochasti- 
cally supermodular in (x, a) and is stochastically nondecreasing in each of x, a, and /. Further, 
P(-\x,a, f) is continuous in a (w.r.t the topology of weak convergence on 30- 

4. Transition kernel complementarity. The transition kernel P(- \x, a, f) has stochastically increas- 
ing differences in (x,a) and /. 

5. Monotone action set. The correspondence A(x) is nondecreasing in x. Further, 
sup ag _ 4 ( a .) tt(x, a, f) is nondecreasing in x for all fixed /. 

6. Countable noise. For each x, a, and /, the support {x 1 : P (x'\x, a, /) > 0} is countable. 

The first assumption is natural for a range of models — if larger states are more valuable, then 
the payoff function will be nondecreasing in the state. The boundedness assumption on the payoff 
will be trivially satisfied if, e.g., X is an interval and the payoff is continuous in x. The second 
assumption ensures that there are complementarities between the state and action of a single player 
and the population state of other players. The next three assumptions create complementarities 
between state and action, as well as ensure that larger states and/or larger actions now are more 
likely to lead to larger states in the future. The last assumption is made to simplify later dynamic 
programming arguments; in particular, it allows us to ignore measurability issues when considering 
optimal strategies (Bertsekas and Shreve 1978). We note that if the payoff and transition kernel are 
continuous, then countability becomes unnecessary for our analysis, since we can restrict attention 
to optimal strategies that are continuous in the state. 
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While it may be straightforward to verify whether a payoff function exhibits the desired com- 
plementarity properties, the same verification is somewhat more challenging for the transition 
kernel. Thus before continuing, we provide an example of a transition kernel that exhibits the 
complementarity conditions required in Definition 6. 

Example 6 (Mixture dynamics). Suppose that P(-|x,a,/) is defined as follows: 

P(.|z, a, f) = q(x, a, f)F(-) + (1 - q(x, a, /))<?(•). (8) 

Here F and G are both distributions on X, such that F first order stochastically dominates G, 
and < q(x,a,f) < 1. If q(x,a,f) is nondecreasing in x, a, and /, supermodular in (x,a), and has 
increasing differences in (x,a) and /, then it can be checked that the expectation of (8) against 
any nondecreasing function satisfies all the conditions of Definition 6. As one example of a q that 
satisfies these properties, suppose: 

x + a + rf(f) 

q{x, a, f) = - — -, 

2 sup X + sup A 

where Tj(f) = f x x'f(dx') is the mean of /. Such dynamics are commonly used in the context of 
games with strategic complementarities (Curtat 1996). □ 

Informally, how might we expect players to behave in such a game? Observe that if other players 
have a larger population state, this increases the return to a larger state for a given player. In order 
to achieve a larger state, a player must take a larger action; but this also increases the likelihood of 
larger states in the future. All these effects conspire to create a situation where, when players are 
confronted with larger population states, they are likely to take higher actions. This monotonicity 
drives our analysis. 

For the remainder of the section we fix a stochastic game with complementarities T = 
(X, A, P, vr, (3). Let $ : £ -> £ denote the composition of V and V for the game T: $(/) = V(V(f) , /). 
A fixed point of $ identifies a mean field equilibrium of T. Intuitively, under the assumptions we 
have made we might expect $ to be a monotone map; i.e., larger initial conjectures about the 
population state should lead players to take higher actions, which should in turn lead to a larger 
invariant distribution. Tarski's fixed point theorem ensures monotone functions on a lattice have 
a fixed point. 1 

Theorem 1 (Tarski 1955). Suppose that C is a nonempty complete lattice, and T : C — >■ C is a 

nondecreasing function. Then the set of fixed points of T is a nonempty complete lattice. 

Note that although Tarski's theorem applies to functions, in our case $ is a correspondence. Zhou (1994) provides 
a generalization of Tarski's theorem to correspondences. 
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We proceed to show that <J> is monotone by showing that each of two correspondences V and 
T> are monotone (with respect to the coordinate ordering on strategies in 9Jlo, and the first order 
stochastic dominance ordering on 3 r ). 

Our main result in this section is the following theorem. 

Theorem 2. There exists a mean field equilibrium for the stochastic game with complementari- 
ties r. 

In the next section, we sketch a proof of this theorem; and in Section 3.2, we show that if we 
restrict attention to equilibria where the strategy is nondecreasing, then there exists a "largest" 
equilibrium and a "smallest" equilibrium. 

3.1. Theorem 2: Proof Sketch 

We sketch the proof of Theorem 2; each step is filled in by the lemmas in the appendix. 

Step 1. We show V(f) is nonempty, and that optimal strategies can be identified via Bellman's 
equation (Lemma 2). 

Step 2. We show that the value function V*(x\f) is nondecreasing in x and has increasing dif- 
ferences in x and /. We use this fact to show that: 

vr(x,a,/) + /3 / V*(x'\f)¥(dx'\x, a, /) 
J x 

is supermodular in (x,a) and has increasing differences in (x,a) and / (Lemmas 3, 4, and 5). 

Step 3. We use the complementarity properties of the previous step to show that the strategies 
p(f) and p(f) are nondecreasing in the state x, where: 

p(/) = sup ?>(/); and (9) 
p(f) = MV(f). 

We also show that p and p are nondecreasing in /. (These facts are shown in Lemma 6). 2 

Step 4. We show that when restricted to strategies \i that are nondecreasing in state, d(/i, f) 
and d(n, f) are nondecreasing in fj, and /, where: 

d(fi,f) = swpV(iJ,,f); and (10) 
d(n,f) = w£V{n,f). 

(This is shown in Lemmas 7 and 8). 

2 See also Hopenhayn and Prescott (1992), Topkis (1998) and Smith and McCardle (2002) for other conditions that 
yield monotonicity of optimal solutions to dynamic programs. 
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Step 5. We conclude that the functions 3>(/) and $(/) are nondecreasing in /, where: 



Hf) = d(p(f)J); Mf) = d( P (f)J). 



(11) 



Thus both <&(/) and $(/) possess fixed points by Tarski's theorem (Lemma 9). These fixed points 
identify mean field equilibria. 

3.2. Largest and Smallest Equilibria 

Typically in games with supermodular structure, it is possible to show various ordering relation- 
ships among the equilibria. In particular, there is typically a "largest" and "smallest" equilibrium 
(Milgrom and Roberts 1990). In our setting, we might conjecture that the largest fixed point of 
(resp., the smallest fixed point of 3?) is the largest (resp., the smallest) mean field equilibrium of 
the stochastic game T. However, this need not be the case: as seen above, monotonicity properties 
of the map T> are only inferred on the subset of strategies that are nondecreasing in the state. 
In general, such monotonicity properties might not hold over the entire strategy set — i.e., d(n,g) 
and d([i,g) may not be nondecreasing over the entire set 9JT . These monotonicity properties are 
necessary for establishing the ordering of equilibria in classical supermodular game theory. 

From the discussion in the preceding paragraph, however, observe that if we restrict attention to 
nondecreasing strategies, then indeed an ordering result can be proven. In particular, the following 
corollary shows that any mean field equilibrium where the strategy is nondecreasing is bounded 
above by the largest fixed point of and bounded below by the smallest fixed point of <]?. 

COROLLARY 1. Let f be the largest fixed point of and let f be the smallest fixed point of i.e.: 



Let (/i, /) be any mean field equilibrium of the stochastic game with complementarities T, where ji 



4. Convergence to Equilibrium 

In this section we show that a mean field equilibrium can be obtained using a natural form of 
learning dynamics among the players. We start by considering a simple form of best response 
dynamics to compute equilibria, where we iteratively apply the maps <I> and $ defined in (11). We 
argue that this process is unsatisfactory, both from a computational and modeling standpoint, and 
instead propose an alternate process we refer to as myopic learning dynamics] these dynamics are 



/ = sup{/ :$(/) = /}; / = inf {/ : $(/) = /}. 



(12) 



is nondecreasing. Then f f ^sd f, an d thus p(f) <! fJ, ^p(f). 



18 



both computationally simpler and correspond to a natural learning behavior among the agents. 
We show that this process converges to mean field equilibria. 

We fix a stochastic game with complementarities T = (X , A, P, vr, (3). Throughout this section 
we study V in the limit of a continuum of agents, consistent with our definition of mean field 
equilibrium. 

4.1. Best Response Dynamics 

We start by considering the following algorithm. 

Algorithm L-BRD: 

1. Initialize the state of every agent to re = inf X, and let f Q denote the resulting population 
state — i.e., f places all its mass on x. 

2. At time t, let fj, t+1 =p(f t ), and let f t+1 = d(fi t , f t ), cf. (9) and (10). 

3. Repeat (2). 

Here L-BRD denotes lower best response dynamics. Given a current population state, we com- 
pute the lowest best response of a player, and then compute the smallest invariant distribution 
corresponding to the resulting strategy. This is the simplest dynamic we might consider; since 
JK/) = d.(p(f), /), we have f t +\ = $(/*)• I n spirit, this algorithm is similar to other best response 
dynamics that are common in the literature on supermodular games (Milgrom and Roberts 1990, 
Vives 1990). 

We now show that this algorithm converges; and further, under an appropriate continuity con- 
dition, the limit point is the smallest mean field equilibrium. We have the following assumption. 

ASSUMPTION 1. The payoff function ir(x,a,f) and the transition probability measure P(-\x,a, f) 
are both jointly continuous in their domains (where we endow 5 with the topology of weak conver- 
gence). 

The next proposition shows L-BRD converges; the proof follows by exploiting monotonicity of 

PROPOSITION 1. Let T be a stochastic game with complementarities. Define f t and fj, t iteratively 
according to Algorithm L-BRD. Then f a ^sd /i ^sd fi • ■ ■ , end [i <!^i <!^ 2 ■ • • • Further, there exists 
a distribution f* and a strategy fx*, nondecreasing in x, such that f t converges weakly to f* as 
t oo, and fi t converges pointwise to [i* as t — >■ oo. 
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//, in addition, Assumption 1 holds, then (//*,/*) is a mean field equilibrium, and f* is f, the 
smallest fixed point of $ (cf. (12)). 

Thus under mild continuity conditions on the model primitives, best response dynamics converge 
to a mean field equilibrium. Further, the limit point is the smallest mean field equilibrium among 
all those where the equilibrium strategy is nondecreasing. 

We conclude by noting that we can analogously define an upper best response dynamic as 
follows. 

Algorithm U-BRD: 

1. Initialize the state of every agent to x = supX; let f denote the resulting population state — 
i.e., /o places all its mass on x. 

2. At time t, let (j, t +i =p(ft), and let f t +i = d(fi t ,ft), cf. (9) and (10). 

3. Repeat (2). 

The same conclusion as Proposition 1 holds for U-BRD as well, except that under Assumption 1, 
the limit point is the largest fixed point of i.e., f* = f (cf. (12)). 

We note that one alternative to L-BRD and U-BRD is presented by Sleet (2001). He suggests 
an algorithm based on iterative value and policy iteration to compute a mean field equilibrium of 
a dynamic price-setting game with stochastic, exogenous firm-specific demand shocks per period. 
The setting considered there is specialized, but the convergence proof also exploits monotonicity 
properties induced by complementarity conditions in that specific model. 

4.2. Myopic Learning Dynamics 

The preceding section establishes the desirable result that best response dynamics converge. How- 
ever, in a dynamic context, iterative application of $ and $ is not completely satisfactory, whether 
viewed from a computational or modeling standpoint. First, given /, computing $(/) or $(/) 
requires computing the invariant distribution of the Markov chain induced by p(f) or p(f), intro- 
ducing additional complexity. Second, the process of iteratively applying $ or <]? does not naturally 
correspond to any reasonable dynamic process that agents are likely to follow in practice: it is 
difficult to imagine an agent first computing the invariant distribution of the current strategy in 
use by her competitors, and then solving a dynamic program given that invariant distribution. 
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By contrast, in this section we present a pair of myopic learning dynamics that address these 
considerations. The algorithms presented in this section are simple and easy to implement. Fur- 
thermore, they demand only a weak form of rationality from the players, thereby resolving the 
two main issues of computability and plausibility associated with the standard solution concept of 
Markov perfect equilibrium (as discussed in the Introduction). 

In the myopic learning dynamic, at each time t, each agent computes a best response to the 
current population state distribution f t , assuming that the population state will remain at f t at all 
future times. (This step is similar to model predictive control or receding horizon control; see, e.g., 
Garcia et al. (1989).) In other words, agents play according to a strategy in V(f t )- This play yields 
a new population state f t +i at the next time step according to the transition kernel. 

The algorithms we consider are reasonable in settings where agents are not likely to predict 
future learning by other agents. Indeed, such an assumption seems plausible precisely in the large 
systems that mean field equilibrium is meant to model. In such systems, myopic behavior is simple 
computationally; by contrast, solving a dynamic program with full knowledge of future strategies 
other agents will employ places unreasonable informational requirements on the agents. 

We first consider an algorithm where agents play actions induced by p. 

Algorithm L-MLD: 

1. Every agent initializes their state to x = inf X at time t = 0. 

2. Agents observe the population state f t . 

3. An agent with state x chooses the action a t so that a t = fJ-t(x), where fit( x ) = p(ft)(x). The 
agent's next state is distributed according to P(-|x, a t , / t ). 

4. Repeat (2)-(3). 

Here L-MLD denotes lower myopic learning dynamics. Observe that agents compute a new 
strategy based on the observed current population state — not based on the invariant distribution 
associated to the last strategy chosen. This means that two simultaneous dynamic processes are 
taking place: strategy revision on the part of the players, but also state update via the system 
dynamics (4). Due to this intertwined dynamic, novel arguments are required to prove convergence 
of best response dynamics (relative to usual proofs of convergence for such dynamics in supermod- 
ular games, e.g., Milgrom and Roberts 1990, Vives 1990). We also note that although the same 
strategy is computed by every agent, the particular action chosen will vary depending on their 
current state. 
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The preceding description yields a simple recursion for the population state at the next time 
step; for all Borel sets S: 

f t+1 (S)= [ P(S\x', f i t (x')J t )f t (dx') = Q, tJt (f t )(S), (13) 
J x 

where Qn,f(f) is defined as follows: 

Q,A9)(S)= [ F(S\x,fi(x)J)g(dx). (14) 
J x 

Our goal is to understand the behavior of the sequence of population states f , fi, f 2 , ■ ■ ., as 

well as the sequence of policies fi , /J>i,fJ>2, We have the following proposition, which mirrors 

Proposition 1. 

PROPOSITION 2. Let T be a stochastic game with complementarities. Define f t and fj, t iteratively 
according to Algorithm L-MLD. Then f fi ^sd fi • ■ • , and /i < \i\ < /i 2 ■ • • • Further, there exists 
a distribution f* and a strategy fx*, nondecreasing in x, such that f t converges weakly to f* as 
t — > oo, and fj, t converges pointwise to fi* as t — )■ oo. 

If in addition Assumption 1 holds, then (//*,/*) is a mean field equilibrium, and f* is f, the 
smallest fixed point o/$ (cf. (12),). 

Thus we find the same result as for L-BRD: under mild continuity conditions on the model 
primitives, the dynamics converge to the smallest mean field equilibrium among all those where 
the equilibrium strategy is nondecreasing. 

The proof of Proposition 2 proceeds as follows. We exploit two key monotonicity properties 
established in the course of proving existence of an equilibrium (Theorem 2): first, that p(f) is 
monotone in / (Lemma 6 in the appendix); and second, that Q fl j(g)(S) is monotone in fj,, f, and 
g (Lemma 7 in the appendix). These two properties together allow us to establish that \i t and f t 
form monotone sequences — even though players are reacting only to the current population state, 
the population state over time moves monotonically towards an equilibrium. 

Note that L-MLD initializes players to the lowest state, inf X. This behavior of L-MLD is par- 
ticularly meaningful for several of the applications described in the Introduction; for example, in 
an interdependent security setting, we might envision a scenario where a new, more efficient tech- 
nology for security is introduced. In this case the "low" initial population state might correspond 
to the status quo, and then the myopic learning dynamics track the adaptation of the population 
to a new equilibrium configuration. 
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A similar convergence result also holds if instead every agent starts at the largest state x = sup X, 
and follows the strategy p(ft) at each time step. We call this Algorithm U-MLD. 

Algorithm U-MLD: 

1. Every agent initializes their state to x = supAf at time t = 0. 

2. Agents observe the population state f t . 

3. An agent with state x chooses the action a t so that a t = fJ-t(x), where = p(ft)(x). The 
agent's next state is distributed according to P(-|x, fi t , f t ). 

4. Repeat (2)-(3). 

Note that (13) continues to hold, with \i t chosen according to the preceding algorithm. The same 
conclusion as Proposition 2 holds for U-MLD as well, except that under Assumption 1, the limit 
point is the largest fixed point of i.e., /* = / (cf. (12)). 

We conclude this section by discussing the behavior of myopic learning dynamics in finite systems. 
In particular, suppose that in a game consisting of m players, each player follows the dynamic 
prescribed by L-MLD: each player starts in the lowest state, and then at each time step, observes the 
current population state and plays one step according to the optimal oblivious strategy given that 
population state. Because the system is finite, additional error is introduced due to the randomness 
in state transitions of individual agents; in particular, due to this randomness, it is not immediately 
guaranteed that myopic learning dynamics will converge to a mean field equilibrium in a finite 
game. However, if the state space is discrete, then using techniques similar to Adlakha et al. (2010) 
it can be shown that / t — > f t weakly, almost surely, where / t is the population state after t 
time steps with m players, and f t is the population state in the L-MLD dynamic after t time steps 
in the mean field limit. Thus after sufficiently many time steps and for sufficiently large finite 
systems, the population state under L-MLD converges approximately to a mean field equilibrium 
population state. We illustrate this point later in Section 9. 

5. Comparative Statics 

In this section we discuss sensitivity analysis of equilibria, also known as comparative statics results. 
Our goal is to understand how the equilibrium distribution and optimal strategy are altered in 
response to changes in parameters. These results allow us to evaluate changes in equilibrium with 
respect to changes in a parameter. 

In this section we consider a family of stochastic games with complementarities, parameterized 
by 8 € O, where is a complete lattice. In the context of security games, this parameter could, for 
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example, represent the effectiveness of a particular security technology. Alternatively, in the context 
of recommendation systems, might represent the effectiveness of the collaborative filtering engine 
in improving recommendations to one agent based on the profiles of other agents. 

Formally, suppose we are given a family of stochastic games T(9) for 9 £ with common strat- 
egy spaces, action spaces, and discount factors, where for each 9, T(9) is a stochastic game with 
complementarities, i.e., T(9) satisfies Definition 6 for each 9 £ 0. We refer to T as a parametric 
family of stochastic games with complementarities. Let tt(x, a, /; 9) and P(- \x, a, f; 9) be the payoff 
and transition kernel, respectively, in T(9). We make the following assumption. 

Assumption 2. The payoff tt(x, a,f; 9) has increasing differences in (x,a, /) and 9. The transition 
kernel P(-\x, a, f;9) has stochastically increasing differences in (x,a,f) and 9, and is stochastically 
nondecreasing in 9 for fixed x,a,f. 

Under the preceding assumption, we can give a directional characterization of the movement of 
equilibrium in response to parameter changes. 

Theorem 3. Let T be a parametric family of stochastic games with complementarities, and suppose 
that Assumption 2 holds. Let f(9) and f(9) denote the "smallest" and "largest" equilibrium in the 
game T(9), cf. (12). Then f{9) and f{9) are both nondecreasing in 9. 

Such comparative statics results are commonly applied in the context of games with complemen- 
tarities; but it is worth noting that in a dynamic context this result provides additional insight, 
because it quantifies how the distribution of agents' states will respond as a parameter changes. 
This kind of insight is particularly valuable for system designers, regulators, and policy makers, 
where changes in equilibrium behavior due to control decisions may be challenging to characterize. 
As one simple consequence of the preceding theorem, suppose that in security games, an incentive 
is introduced for agents to invest in security as a linear rebate in the payoff, proportional to an 
agent's security level x. It is straightforward to check that this results in more players opting for 
higher investment, and thus the equilibrium population state tends to shift towards higher security 
levels. 

6. Coupling Through Actions 

In the stochastic game model considered thus far, players' payoffs and dynamics are "coupled" 
through their states; formally, Tr(x,a,f) and ¥(-\x,a, f) depend on the population state /, which 
is in turn a distribution over the state space X. In many models, however, the coupling between 
agents is through their actions, rather than states; that is, / is a distribution over the action set A, 
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rather than over the state space. Our analysis extends rather easily to models of this form; in this 
section we briefly discuss existence of, and convergence to, mean field equilibrium in such models. 

Formally, an action- coupled stochastic game T = (X, A, P, it, (3) has the following distinctions from 
the (state-coupled) stochastic game defined in Section 2. 

Population action distribution. We define the population action distribution as follows. Let 
a_ ij t(a) denote the fraction of players (excluding player i) that play action a at time t, i.e.: 



where \{ a . t=a y is the indicator function that the action of player j at time t is a. We refer to a_ i t 
as the population action distribution at time t (from player i's point of view). 

We let 3a denote all Borel probability measures over A. Note that the population action distri- 
bution lies in 3a- 

Transition probabilities and payoff. We denote the payoff by tt(x, a, a), and the transition kernel 
by ¥(-\x,a,a), where a is a population action distribution, i.e., an element of 3a- 

Recall that in defining mean field equilibrium in Section 2.2, we consider two maps V(f) and 
D(fi, f); the former gives the set of optimal oblivious strategies given a population state /, and 
the latter gives the set of invariant distributions under a kernel with strategy \x and population 
state /. Those maps are analogously defined for action-coupled stochastic games, but with / as 
the population action distribution rather than the population state; we omit the formal details. 
With a slight abuse of notation, we let V(a) be the set of optimal oblivious strategies for a player, 
given population action distribution a; and we let T>(/j,,a) be the set of invariant distributions of 
the dynamics induced by oblivious strategy /i and population action distribution a. 

In order to define mean field equilibrium, we require one additional function. Given a population 
state / and an oblivious strategy fx, let V(fi, f) give the resulting population action distribution; 
i.e., for Borel sets S: 



Note that fi~ 1 (S) is the set of states x such that n(x) € S. In order for this definition to be well 
posed, we require the strategy [i to be Borel measurable; to avoid this issue we simply assume that 
all model primitives are continuous, i.e., that Assumption 1 holds. Under this assumption it can 
be shown that we can restrict attention to Borel measurable strategies \i. 

If every agent conjectures that a is the long run population action distribution, then every agent 
would prefer to play an optimal oblivious strategy [i. On the other hand, if every agent plays ji, 




(15) 
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and the long run population action distribution is indeed a, then a must also be the population 
action distribution that results from an invariant distribution in T>(fi,a). This yields the following 
definition of a mean field equilibrium for action-coupled stochastic games. 

Definition 7. A strategy fj,, population state /, and population action distribution a constitute 
a mean field equilibrium of an action-coupled stochastic game r if \jl € V(a), f € T>(fi,a), and 
a = T>(jjL,f). 

An action- coupled stochastic game with complementarities is then defined exactly as in Defini- 
tion 6, but with the population state / replaced by the population action distribution a. Extending 
the argument in the proof of Theorem 2, we can prove the following theorem. 

Theorem 4. Suppose Assumption 1 holds. Then there exists a mean field equilibrium for any 
action- coupled stochastic game with complementarities V . 

As is clear from the proof, the same monotonicity properties employed to prove existence of 
a mean field equilibrium can also be used to extend Corollary 1 (establishing the existence of 
a "largest" and "smallest" mean field equilibrium) as well as Proposition 1 and 2 (establishing 
convergence of best response dynamics and myopic learning dynamics, respectively). We omit the 
details of these derivations as they mirror earlier development in the paper nearly identically. 

7. Separable Stochastic Games 

As the preceding sections illustrate, stochastic games with complementarities possess a number of 
properties that make them amenable to equilibrium analysis. One potential concern, however, is 
that the set of models admitted by Definition 6 may be somewhat limiting. Consider the following 
example. 

Example 7 (Linear dynamics). Consider a simple model where the distribution of the next 
state of an agent is "linear" in x and a. Let TV be a zero mean random variable that takes countably 
many values, and fix positive constants A and B. We consider a state space X = [-M, M], for some 
large positive constants M, M; and let A(x) = [a, a] for all x, where a<a. Define P as follows: 

( Prob(Ac + Ba + W > M), x' = M; 
F(x'\x,a) = < Prob(Ax + Ba + W = x'), -M<x'<M; (16) 
( Prob(Ax + Ba + W < — M) , x' = -M. 

In this model, the state dynamics are essentially linear, except at the boundaries of the state space 
(where the state is truncated to lie within [— M , MJ). Such a model might naturally arise in a wide 
range of examples, e.g., Examples 1, 2, or 4 (see Section 8 for details). 
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Unfortunately, such a kernel does not exhibit stochastically increasing differences in general. To 
see this, we consider a simple instance where Prob(VF = 1) = Prob(VF = — 1) = 1/2, and M_ = M = 
M 1. Consider any nondecreasing function (ft(x), and fix x and a such that \Ax + Ba\ < M — 1. 
Then: 

E[(j)(x')\x, a] = -<f>(Ax + Ba + 1) + -(ft(Ax + Ba-1). 

In general, the right hand side exhibits increasing differences in x and a only if (ft is locally con- 
vex. This is easiest to see for differentiable (ft: in that case the cross partial derivative d 2 (j)(Ax + 
Ba + l)/dxda has to be nonnegative to ensure increasing differences, which only holds if (ft"(Ax + 
Ba + 1) > 0. For general nondecreasing (ft, therefore, the expectation E[<^>(x')|x, a] need not exhibit 
increasing differences in x and a. □ 

The preceding example highlights a deficiency in stochastic games with complementarities: while 
a rich class analytically, they do present some restrictions from a modeling standpoint. In this 
sense, complementarity can appear to be a brittle property. 

However, this same brittleness can actually become an advantage: although at first glance it 
may appear that complementarity fails, often simple transformations can lead to games that admit 
analysis via complementarity methods even if the original game did not. (A common example is 
the class of log-supermodular games used extensively in oligopoly theory, where the logarithm of 
the profit function may be supermodular; see, e.g., Milgrom and Roberts (1990) and Vives (1990) 
for details.) 

In this section we demonstrate that a wide range of models, including those with dynamics 
similar to Example 7, can be transformed to standard stochastic games with complementarities. 
Further, the class of models we develop has the benefit that the assumptions are typically easier 
to check in practice. This significantly widens the applicability of our theory to models where the 
desired monotonicity properties may not be immediately apparent. 

The class of games we consider in this section feature a payoff that is separable in the state and 
action. We have the following definition. 

Definition 8. A separable stochastic game is a stochastic game Y = (X ,A,¥,tt, (3) with the fol- 
lowing additional properties. 

1. Actions. There exist a, a, such that A(x) = [a, a] for all x. 

2. Payoff. The single period payoff to player i at time t can be written as 7r(xi jt , Oj jt , = 
v(xi tt , f ™\) — c(a l , t ), where we refer to v(x, f) as the utility at state x and population state /, and 
c(a) as the cost for action a. 
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3. Transition probabilities. The state of a player evolves according to a Markov process with the 
following transition probabilities. If the state of player i at time t and the player takes 

an action = a at time t, then the next state is distributed according to the Borel probability 
measure F(-\h(x, a), /), where for Borel sets S C X, 



Note that P depends on x and a only through the function h(x,a); we refer to h(x,a) as the kernel 
parameter. We assume that h takes values in a compact interval H = [h, h] C 1L 

In this section we provide insight into separable stochastic games with complementarities. We 
have the following definition. 

Definition 9. A separable stochastic game with complementarities is a separable stochastic game 
r = (X, A, P, 7r, (3) with the following properties. 

1. Nondecreasing payoff and convex cost. The utility function v(x,f) is nondecreasing in x, and 
the cost function c(a) is nondecreasing and convex in a. Further, for fixed /, sup xeX \v(x,f)\ < oo. 

2. Payoff complementarity. The utility function v(x, f) has increasing differences in x and /. 

3. Monotone transition kernel. The transition kernel P(-|/i, /) is stochastically nondecreasing in 
h and /. Further ¥(-\h,f) is continuous in h (w.r.t. the topology of weak convergence on J). 

4. Transition kernel complementarity. The transition kernel P(-|/i, /) has stochastically increas- 
ing differences in h and /. 

5. Kernel parameter monotonicity and complementarity. The kernel function h(x,a) is super- 
modular in x and a, nondecreasing in the state x, and concave and nondecreasing in the action a. 

6. Countable noise. For each h, the support {x 1 : ¥(x'\h) > 0} is countable. 

We proceed by reparametrizing the strategy in terms of the kernel parameter; under this 
reparametrization, the resulting model is revealed to be a special case of the general model studied 
earlier in this paper. 

Formally, suppose we are given a separable stochastic game with complementarities T = 
(X,A, P, 7i",/3). Before we proceed, we require some additional notation. For each x, define: 



F(S\h(x,a),f) = Prob (x itt+ i G S\x i>t = x, a i>t = a, f_ it = f) . 



(17) 



H(x) = {h : h(x, a) = h for some a G A}. 



(18) 



Thus H(x) is the image of A under h(x, ■). In addition, for each h E H(x), define: 



C(x,h) 



inf „c(a). 



(19) 



a£ A:h(x,a)— h 
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Thus C(x,h) is the minimum cost incurred to achieve kernel parameter h when at state x. 

The next lemma establishes some basic properties of H and C. It uses the assumption that the 
cost function is a convex function of action a. 

Lemma 1. Suppose T is a separable stochastic game with complementarities. Suppose H(x) and 
C(x,h) are defined as in (18) and (19), respectively. Then for eachx, H(x) is a compact interval, 
and the sets H(x) are nondecreasing in x. 

The function C(x,h) is convex and nondecreasing in h on H(x) for each x, and nonincreasing 
in x for each h as long as hG H{x). Further, for all x: 

inf C{x,h) = c{a). (20) 

Ifx'>x, h',he H(x')nH(x), and h! >h, then: 

C(x', h') - C(x, %') < C(x', h) - C(x, h). 

In other words, C(x,h) has decreasing differences in x and h. 

We now use Lemma 1 to define a new stochastic game, which is in fact a stochastic game with 
complementarities as in Definition 6. 

PROPOSITION 3. Suppose that T = (X,A,P,ir,/3) is a separable stochastic game with complemen- 
tarities. Define a new game T = (X,A,P,7t,/3), where: 

1. X = X; 

2. A(x) = H(x) for all x <G X; 

3. P(x'\x,h,f)=¥(x'\h,f); and 

4. Tr(x,h,f)=v(x,f)-C(x,h), 

with H(x) and C(x,h) are defined in (18) and (19), respectively. Then T is a stochastic game with 
complementarities, cf. Definition 6. 

Based on the preceding proposition we have the following theorem. 

Theorem 5. Any separable stochastic game with complementarities T has a mean field equilibrium. 

The preceding result can be extended, of course, to provide analogs of Corollary 1 (existence of 
a largest and smallest equilibrium), as well as Propositions 1 and 2 (convergence of best response 
dynamics and myopic learning dynamics, respectively). (The appropriate generalization of Assump- 
tion 1 is that P should be jointly continuous in h and /, and h should be jointly continuous in x 
and a.) Note, however, that the dynamics defined here are in the modified strategy space, where the 
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"action" is the kernel parameter chosen. In particular, the dynamics in the original action space 
may not be monotone at all; nevertheless, the eventual limit point is a mean field equilibrium. 

It is also straightforward to generalize the comparative statics result in Theorem 3 to separable 
stochastic games using the same transformation as the preceding result. In addition, the definition 
of a separable stochastic game with complementarities can be naturally extended to separable 
action-coupled stochastic games with complementarities (simply by replacing the population state 
/ by the population action distribution a in the payoff and transition kernel), and an argument 
similar to Proposition 3 shows that such a game can be transformed to a standard action-coupled 
stochastic game with complementarities. 

We conclude this section by noting that the preceding results continue to hold in a setting where 
the payoff is not necessarily monotone, as long as dynamics are decoupled. Formally, suppose that 
r = (X, A, P, 7T, (3) is a stochastic game that satisfies all the conditions in Definition 9, except that v 
is not necessarily nondecreasing in x. Suppose in addition that ¥(-\h,f) does not depend on /; 
thus we denote the kernel simply ¥(-\h). In this model it can again be shown that a mean field 
equilibrium exists, as we now describe. 

The proof of Theorem 2 (and subsequent results on ordering of equilibria and convergence) use 
the fact that the payoff is nondecreasing in x to show that J x V* (x' \f)P(dx' \x , a, f) is supermodular 
in (x, a) and has increasing differences in (x, a) and / (see Lemma 3). In order for the expectation to 
preserve these properties, the integrand must be nondecreasing in state; this is why we require the 
payoff to be nondecreasing. However, if P only depends on the kernel parameter, then we can show 
that J x V*(x'\f)¥(dx'\h) has increasing differences in x and h, even if the payoff is not necessarily 
nondecreasing. For details, we refer the reader to Lemma 13 in the Appendix. Substitution of this 
lemma in the proof of Theorem 2 yields the desired result. 

8. Examples 

In this section we revisit the five examples mentioned in the Introduction: interdependent security; 
collaborative filtering; dynamic search with learning; coordination games; and oligopolies with 
complementarities. We show that each of these examples can be formalized within the framework 
developed in this paper, so that the existence and convergence results we have proven apply. 

8.1. Example 1: Interdependent Security 

We consider a dynamic model of interdependent security in a computer cluster, where the state x 
gives the security level of a player. Players can improve their security level through investment; 
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an investment a incurs a cost c(a) that is convex and nondecreasing in a. A higher action leads to 
improvement in the security level, and with no or little investment the security level deteriorates 
due to depreciation. Thus a reasonable model for the dynamic evolution of the security level might 
be the linear dynamics in (16), where M_ = and M>0, and A = l, B > 0, and W has a negative 
expected value. Let p(x) be the probability of a bad event occurring when an individual computer 
is at the security level x, and let L be the cost of this bad event to the host. 

We consider a simplified model where at each time step, an individual computer "talks" to a 
randomly selected computer in the network. (This talk can be in form of establishing a TCP con- 
nection, exchanging data, emails, etc.) Thus, at each time, there is a probability that an individual 
computer will suffer a bad event because of the security level of the rest of the network. Let /_i,t(y) 
be the fraction of all computers (except computer i) that have their security level at y at time t. 
Then, at each time step, computer i receives an expected value that is given as: 

v(x itt ,x-i tt ) = -p(x it t)L-(i-p(x i ,t)) (^2f-i,t(y)p(v)j ■ 

The first part of the payoff reflects the security of host i. The scaling factor 1 — p(xi^) in the second 
term is the probability that no bad event happens because of the individual security level. The 
term f-i,t(y)p(y) represents the average security level of the rest of the network. Because p is 
decreasing, it is straightforward to verify that the product of these two terms exhibits strategic 
complementarities between the security level of agent i, and the security level of every other agent. 
It follows that this is a separable stochastic game with complementarities. 

8.2. Example 2: Collaborative Filtering 

As a canonical example, we consider the collaborative filtering system used by a recommendation 
engine on a movie rental site such as Netflix. We let the state x be the quality of a user's profile, 
and assume x takes values in a compact interval. The action a represents the effort put forth in 
updating her profile, e.g., through rating more movies; actions are costly, with c(a) denoting the 
cost incurred by action a. We assume c is convex. If user i does not put forth any effort at time t, 
then the profile becomes "stale," i.e., the quality of the profile drops over time. Thus in this model 
the quality can be modeled via dynamics as in (16) as well, where A = 1, B > 0, and W has negative 
expected value. 

Based on the quality of a user's profile x as well as the profile of other users in the system 
(captured by the population state /), the recommendation system suggests a movie to a user. Let 
v(x,f) denote the expected desirability of the movie recommended to a user, given their profile 
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quality x and the population state /. Observe that v will increase if x increases, since a more 
accurate profile results in more accurate recommendations. However, for most collaborative filtering 
systems, it is also the case that if others have higher quality profiles, then the marginal return to 
a higher quality profile is higher; for example, this would be the case under a nearest neighbor 
algorithm as is commonly used by a variety of online recommendation systems. Thus such a model 
is a separable stochastic game with strategic complementarities. Collaborative filtering systems are 
one example of a setting with positive network effects; games with strategic complementarities are 
commonly used to model settings with positive network effects. 

8.3. Example 3: Dynamic Search with Learning 

We consider a model where at each time step, a trader exerts effort to search for trading partners. 
As discussed in Example 3 in the Introduction, traders' experience grows with both their own effort 
and the effort of others. To formalize this notion, suppose traders choose effort each time step 
from [0,a], where a > 0. Let the state x denote the current search productivity of a given trader; 
we assume x € [0,x] where x> 0. Finally, given a population action distribution a, we let 77(a) be 
defined as: 



where c(a) is a cost of effort. In particular, observe that the first term of the payoff increases as 
the search productivity increases, the players own effort increases, or the mean effort of others in 
the system increases. 

As a trader exerts effort, they gain experience and their search becomes more productive. Further, 
as discussed in Example 3 in the Introduction, traders' experience grows with both their own effort 
and the effort of others; as in our other examples, with insufficient effort the search productivity 
decreases as previously acquired experience becomes outdated. Thus we assume the transition 
kernel is defined as in (8), where: 



This is a model where traders are coupled through their actions, cf. Section 6. It is straightforward 
to verify that this model exhibits the complementarity properties required for an action-coupled 
stochastic game with complementarities. 




We then assume that traders receive a payoff tt(x, a, a) defined as: 



ir(x, a, a) = xarj(a) — c(a) 



q(x, a, a) 



x + a + 77(a) 



x + 2a 
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8.4. Example 4: Coordination Games 

In this model, a collection of agents are interested in coordinating on a common state; the model 
we present is a related to the one studied by Huang et al. (2006). Actions can alter the state, 
but any nonzero actions are costly. We assume that X = [—M,M], and A = [0,L]. We assume 
dynamics are linear, cf. (16), where A, B > 0, and W has negative expected value. Each player tries 
to minimize mean squared error to the other players' average state, and incurs a quadratic cost for 
taking nonzero action. If / is the current population state, we let: 

v(f) = / xf(dx). 
J x 

We assume the payoff of a player is: 

tt(sc, a i f) = ~{x ~ r]{f)) 2 - a 2 . 

It is straightforward to verify that — (x — rj) 2 has increasing differences in x and rj, since the 
cross partial derivative with respect to x and r\ is positive (Topkis 1998). It follows that v(x,f) = 
— (x — r](f)) 2 has increasing differences in x and /. Further, c(a) = a 2 is convex and nondecreasing 
in a G [0,L]. 

In principle we would like to claim this is a separable stochastic game with complementarities, 
but the payoff is not monotonic in x. As discussed at the end of Section 7, however, if the transition 
kernel does not depend on / (as is the case in (16)), then the payoff need not be monotonic in x — 
and all our results continue to hold. Thus existence of equilibrium and convergence of MLD can be 
guaranteed for this model. Notably, our convergence result provides justification for a distributed 
control interpretation of this coordination game, where multiple individual agents can execute a 
myopic algorithm and yet converge to a common state. 

We conclude by noting one peculiarity of our formulation: we have A= [0,L], so in particular, 
players cannot move backwards (i.e., take negative action). This assumption is made to ensure 
that c(a) is nondecreasing, as required for a separable stochastic game with complementarities. 
However, in the original formulation of Huang et al. (2006), W has zero expected value, but players 
are allowed to take both positive and negative actions. This expanded formulation can still be 
analyzed using the methods of this paper. 

Formally, suppose that W has zero expected value, and A = [-L, L]. Then even though c(a) = a 2 
is no longer nondecreasing on A, we can still show in this specific model that C(x,h) (cf. (19)) 
exhibits decreasing differences in x and h. To see this, note that with the linear dynamics of 
(16), C(x,h) = (h- Ax) 2 jB 2 for h G H(x). Upon differentiating it follows that d 2 C/dxdh < 0, 
establishing that C(x,h) has decreasing differences in x and h. By substituting this observation in 
the analysis of Section 7 we recover all the results of that section. 
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8.5. Example 5: Oligopolies and Complementary Goods 

Consider an oligopoly scenario where the goods produced by firms are complements. As firms gain 
experience in production, their cost of production decreases. We let x £ [0, x] be the experience level 
of a firm, and let a € [0, a] be the quantity produced by a firm. Let P(a, a) be the inverse demand 
curve seen by a firm, where a is the population action distribution. Thus this is a monopolistic 
competition model, where firms sell differentiated products and the market clearing price seen by 
a firm depends on the quantities produced by all firms. The per period payoff to a firm is: 



where c(x, a) is the cost of producing quantity a when a firm's experience level is x. Note that 
since a is the population action distribution, this is a game with coupling through actions. 

We assume that firms' experience levels increase with higher quantities produced; for example, 
we might consider dynamics of the form (8) with q(x,a, a) defined as: 



Note in particular that in this model experience levels evolve independently across firms. 

We note that the cost of production will typically decrease with the experience level. Thus, 
c(x, a) is decreasing in x. Further, at a higher experience level, a firm's marginal cost of production 
typically decreases, so we expect c(x, a) to have decreasing differences in x and a. Finally, since 
the payoff is separable in x and a, it has increasing differences in those two parameters. 

Since goods are complements, if a' ^sd ct, then we expect for a fixed production quantity a the 
price is higher at a', i.e., P(a, a') > P(a, a) for every fixed a. Furthermore, it is natural that if a' >z 
a, then for a slight increase in production, a firm can charge a higher price for its goods. In other 
words, P(a,a) should have increasing differences in a and a. Under these natural assumptions, it 
is straightforward to verify that 7r(x,a,a) has increasing differences between a and a. Thus this 
game is a action-coupled stochastic game with complementarities. 

9. Numerical Analysis 

In this section, we study a numerical example that highlights the utility of mean field equilibrium 
as a tool to analyze large scale stochastic games with complementarities. Specifically, we consider 
an interdependent security model, cf. Example 1 and Section 8.1. We have three main goals. First, 
we illustrate that mean field equilibrium provides basic structural insight into equilibria in a simple 
and computable fashion; in particular, we provide comparative statics analysis with respect to cost 



tt(x, a, a) = aP(a, a) — c(x, a), 




x + a 
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and transition kernel parameters. Second, we use the model to evaluate the effect of heterogeneous 
player populations on the equilibrium outcome. Finally, we also evaluate how well myopic learning 
dynamics perform in systems with finitely many players. Our analysis suggests that, particularly 
for stochastic games with complementarities, mean field equilibrium is a powerful analytical tool 
that provides rich structural insights relatively painlessly to the modeler. 

9.1. Model 

We assume that the security level of a player is a positive integer x G [0, 50] with the interpretation 
that a higher value of the state implies a higher security level. The probability that a player does 
not get infected is assumed to be proportional to its state; after normalization we have: 

l-p(x) = , 

1 + KX 

where k > is a scaling factor. At each time step, a player takes an integer action a € [0, 25] to 
improve its security level. 3 This action results in a cost ca, where c > is the marginal cost of 
action. The payoff also depends on the average security level of other players in the system. For 
a fixed player i, we let ry_.j = f-^t{y)p{y) denote the average security level of the system (from 
the viewpoint of player i). Thus, the per period payoff to player i is given by: 

■K(x u a h f_i) = -p{xi) - (1 - p(xi))r)-i - ccti. 

Here / is the population state. 

For the purposes of this example, we restrict attention to a separable stochastic game, where 
the dynamics depend on the kernel parameter given by h(x,a) = x + a. At each time step, based 
on the kernel parameter, the next step is stochastically distributed as follows: 

( Prob(/i + W>50), x' = 50; 
F(x'\h) = < Prob(/i + W = x'), < x' < 50; 
[Problh + W <0), x' = 0. 

Here W is a random variable that takes values in the discrete set {—1,0,1} with the probability 
mass function given by: 

{<?_i, w = -1; 
q , w = 0; 
q u w = 1; 

We initially choose g_i = q± = 0.4, and q = 0.2. 

3 Note that in separable stochastic games with complementarities as defined in Section 7, we require actions to be 
chosen from a continuous interval. However, for computational purposes, we consider a discrete approximation to the 
model proposed there, where actions are drawn from a discrete set. 
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k= 0.05 




State 

Figure 1 Cumulative distribution function of population state at t — 1000 under L-MLD for k = 0.05 and c = 
0.005,0.01,0.05. 

A player maximizes its expected discounted payoff with the discount factor (3 = 0.75. We compute 
the mean field equilibrium using the L-MLD algorithm, where we use value iteration to compute an 
optimal oblivious strategy given the current population state. For the purposes of this simulation, 
we declare value iteration to have converged if the total difference (across all states) between 
iterates is less than 10~ 4 . Having computed the optimal strategy for a given population state, the 
next population state is computed using the recursion given in (13). For each simulation scenario, 
we run 1000 iterations; for reference, we note that each run takes approximately 8 — 10 minutes on 
an Intel Core 2 Quad Q6600 (2.4GHz) machine with 3GB RAM. At the end of 1000 iterations, the 
total variation distance between the current population state and the previous population state 
is always less than 5 x 10 -4 , so we refer to the population state at t = 1000 as the mean field 
equilibrium population state. 

9.2. Comparative Statics: Marginal Cost 

Figure 1 plots the cumulative distribution function of the mean field equilibrium population state 
for k = 0.05 and for different values of the marginal cost c at the end of the 1000-th iteration. 
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Figure 2 Cumulative distribution function of population state at t — 1000 under L-MLD for k = 0.05 and marginal 
cost c = 0.05, with different distributions for W . In all cases go = 0.2. The solid line is for g_i =qi = 0.4; 
the dashed line is for g_i = 0.45, qi = 0.35; and the dashed dotted line is for g_i = 0.5, qi = 0.3. 



Observe that as the marginal cost increases from c = 0.005 to c = 0.05, the mass of the equilibrium 
population state shifts to lower security levels. This is as predicted by Theorem 3: at a higher 
marginal cost, it is costly to maintain a higher security level, and hence players tend to invest 
less — resulting in an equilibrium distribution with substantial weight at lower states. Note that 
even small changes in the marginal cost can significantly shift the equilibrium profile. 

9.3. Comparative Statics: Transition Kernel 

Figure 2 plots a different kind of comparative statics result, where we plot the cumulative distri- 
bution function of the mean field equilibrium population state for different noise distributions. We 
observe that as the mean of the noise distribution becomes negative, the equilibrium distribution 
tends to concentrate over lower states. One can interpret this negative mean as the tendency of 
the player's security to deteriorate over time, e.g., because the anti- virus software installed on a 
machine becomes outdated. Thus, each player needs to constantly take an action to maintain its 
security level. For a fixed marginal cost of action, a more negative drift results in players moving 
toward lower security levels. Note that even for small negative drift in the noise distribution, the 
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cumulative distribution rapidly concentrates over lower states. Thus in order to maintain a desired 
security level in a network, a network administrator should try to ensure that an individual player's 
security level does not depreciate quickly over time. 

9.4. Heterogeneity 

We now consider a model where players may be heterogeneous. Specifically, we consider two types 
of players whose payoff function is parameterized by an interaction parameter 8. This interaction 
parameter controls the effect of the security level of other players on the payoff of an individual 
player. The payoff function is then given by: 

7T (x, a, /; 8) = -p(x) -6(1- p(x))r)(f) - ca, 

where 17(f) is the mean of /. A higher value of 8 implies that a player frequently interacts with 
other players in the network. Thus, the average security level of other players has a higher impact 
on its own security. As discussed in the conclusion, all the results of our paper continue to apply 
for a model with heterogeneous players. 

For the purposes of this numerical example, we consider two values of the interaction parameter — 
a low value given by 8l = 0.1 and a high value given by 5h = 0.9. Figure 3 plots the mean of the 
mean field equilibrium population state as the fraction of players with lower 8 increases. When all 
players have a high interaction parameter, they interact with each other more often. Hence each 
player has a high probability of a bad event, and so feels that a personal investment in security 
is not likely to be particularly productive. A "tragedy of the commons" ensues, and the resulting 
mean population state is quite low in equilibrium. On the other hand, even for a small fraction of 
players with the low interaction parameter, the mean security level of all players increases. This is 
because those players feel a positive benefit to investing their own security level, and thus encourage 
others to invest as well — even those with the high interaction parameter. Thus in a real network, 
slightly limiting interaction between players can have a significant impact on the overall security 
level of the system. 

9.5. Convergence of L-MLD: Finite vs. Mean Field 

As noted above, the mean field equilibrium is computed using L-MLD. In such dynamics, agents 
compute their current optimal strategy assuming that the population state remains fixed for all 
future time. Using this computed optimal strategy, the next population state is computed using 
equation (13). This process is repeated until convergence. In this computation, the next population 



38 



k= 0.05, c =0.05 
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Fraction of Players with LowerS 

Figure 3 Mean of the mean field equilibrium population state vs. the fraction of players with low interaction 
parameter, i.e., 8 = 5l- Here k = 0.05 and c = 0.05. 

state is a deterministic function of the previous distribution, and it depends on the optimal strategy 
via the transition kernel; this is a consequence of the mean field limit. 

Here we ask the question: what happens if finitely many players use L-MLD? In other words, 
each player i in a game with m players observes the true population state at time t, and 
then executes L-MLD with respect to this population state. Errors are then introduced because 
the next population state is stochastic in a finite system. (See discussion at the end of Section 4.) 

In Figure 4, we plot the total variation error between successive population states, for m = 50 
and m = 1000 players. As we observe, for a small number of players (m = 50), the error can be 
high, and accumulates as time passes. However, for a large number of players (m = 1000), this error 
is considerably reduced. 

This effect is also seen in the limiting population state. In Figure 5, we plot the cumulative 
distribution function at t = 1000 for three cases: the mean field L-MLD; L-MLD with m = 50 
players; and L-MLD with m = 1000 players. As we observe, for m = 1000, the population state is 
very close to the population state obtained using the mean field deterministic update (cf. (13)). 
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Figure 4 Total variation distance between the actual distribution and the empirical distribution for L-MLD 
with m = 50 and L-MLD with m = 1000 players. Here n = 0.05 and c = 0.05. 



10. Conclusion 

This paper has considered existence of mean field equilibrium in games that exhibit strategic com- 
plementarities in the states of the players. Our proofs exploit monotonicity and complementarity 
properties of the model primitives to demonstrate that there exist both a "largest" and "small- 
est" mean field equilibrium among all equilibria where the strategy is nondecreasing in the state. 
Further, we demonstrate that there exist natural myopic learning dynamics that converge to these 
equilibria. Finally, we apply our results in the context and illustrate how specific examples of games 
with complementarities may be analyzed using our techniques. 

We conclude by noting two extensions that can be developed for the models described here. 

1. Types. In our model players are homogeneous; however, this is not a consequential restriction, 
and is made primarily for convenience. In Section 9.4, we considered a numerical example with het- 
erogeneous players. More generally, we can extend the definition of a stochastic game by assuming 
that there exists a finite type space A, with tt(x, a, /; S) and P(-|a;, a, f; 5) the payoff and transition 
kernel, respectively, of a type 5 player. Further, we assume that the probability a player is of type 5 
is given by ip(6). With this extension, as long as the conditions of Definition 6 are satisfied for 
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k= 0.05, c = 0.05 




State 

Figure 5 Cumulative distribution function of the population state at t = 1000 under L-MLD for k = 0.05 and 
marginal cost c = 0.05. The plot includes the population state obtained under the deterministic mean 
field L-MLD; L-MLD with m = 50 players; and L-MLD with m = 1000 players. 

each S, it is straightforward to extend our existence, convergence, and comparative statics results. 
The main technical issue is that now a mean field equilibrium must provide an optimal strategy n$ 
and population state f$ for each 8. We omit the details. 

2. Multidimensional state and action spaces. A more difficult extension involves models where the 
state and action spaces may be multidimensional lattices. The main challenge here arises because 
the set of distributions on a multidimensional compact lattice X is not generally a lattice in the 
first order stochastic dominance ordering; see Kamae et al. (1977) for details. However, first order 
stochastic dominance does give a closed partial order on the set of distributions on X. 

We can leverage this fact as follows. Suppose that in addition to the conditions of Definition 6, 
the action set is a fixed lattice A for all x, i.e., A(x) = A for all x. 4 Further, suppose the model 
primitives (payoff and transition kernel) are all continuous in state, action, and population state — 
i.e., Assumption 1 is satisfied. Then Kleene's fixed point theorem (Kleene 1971) can be used to 

4 We employed the total ordering of A(x) in proving the value function V*(x\f) is nondecreasing in x, via Lemma 4; 
however, if A does not depend on x, then it is straightforward to check that V*(x\f) is nondecreasing in x even if A 
is multidimensional. 
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establish existence of, and convergence to, a mean field equilibrium. Kleene's fixed point theorem 
states that if X is a space with a closed partial order and a smallest element, then any monotone 
continuous function from X to itself possesses a fixed point. We omit the details of this argument, 
as it is essentially identical to our preceding development. 

We do emphasize, however, that our analysis of separable stochastic games is intimately tied 
to the assumption that state and action spaces are single-dimensional. In particular, our proof 
techniques rely heavily on the scalar nature of the action and kernel parameter spaces (cf. Lemma 
1); relaxing these conditions remains an open direction. 

Appendix A: Proofs: Section 3.1 

We start with the lemma that demonstrates that optimal strategies exist, and can be identified via 
Bellman's equation; the proof uses standard results from dynamic programming. 

Lemma 2. For each f V(f) is nonempty. Furthermore, [i £ V(f) if and only if for each x: 
n{x) € arg max < ir(x, a, /) + /? I V*(x'\f)F(dx'\x, a, f) > 

aeA(x) { J x J 

Proof. Throughout the proof we employ Definition 6. In particular, observe that the payoff 
is continuous on a compact set A and thus for fixed /, the payoff ir(x,a,f) is bounded. In addi- 
tion, for each fixed x, the next state is drawn from a countable set by assumption. Thus consider 
maximization of the expected discounted profit over all possible (randomized, history-dependent) 
strategies; by standard results in the theory of dynamic programming (see Bertsekas and Shreve 
1978), it can be shown that if there exists an optimal strategy, there must exist an optimal sta- 
tionary, nonrandomized, Markov strategy — i.e., in our terminology, an oblivious strategy. Further, 
V* satisfies Bellman's equation: 

V*(x\f)= sup U{x,a,f) + p [ V*{x'\f)F(dx'\x,a,f)X; (21) 

and an oblivious strategy is optimal if and only if it attains the maximum on the right hand side 
of the preceding expression for every x. 

Observe also that for fixed /, V*(-\f) is bounded, since the per stage payoffs are bounded and 
the discount factor is less than one. Since the transition probability P(-|x,a, /) is continuous w.r.t. 
the topology of weak convergence, we conclude the objective function in (21) is continuous in a. 
Because A(x) is compact, the maximum is achieved for every x, and thus at least one optimal 
strategy must exist — i.e., V{f) is nonempty. This proves the lemma. □ 

The next three lemmas combine to show that the value function V*(x\f) has increasing differences 
in x and /. 
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Lemma 3. Suppose that U(x\f) is a nondecreasing bounded function in x and has increasing dif- 
ferences in x and f. Define 

T(x,a,f)= [ U(x'\f)F(dx'\x,a,f). (22) 
Jx 

Then T(x,a,f) is nondecreasing in x and a, supermodular in (x,a) and has increasing differences 
in (x, a) and f . 

Proof. By Definition 6, ¥(-\x,a, f) is stochastically nondecreasing in x and a and stochasti- 
cally supermodular in (x,a). Since U(x\f) is a nondecreasing bounded function, it follows from 
the definition of stochastically nondecreasing and stochastically supermodular that T(x, a, /) is 
nondecreasing in x and a and supermodular in (x,a) for fixed /. 

Fix x > x, a> a, and / ^sd / and define 

T(x,a,f,g)= U(x\f)¥(dx\x,a,g). 
J x 

To prove T(x,a,f) has increasing differences in (x,a) and /, it suffices to show that 

f(x, a, f, f) - f(x, a, f, f) > f(x, a, f, f) - f(x, a, f, f). (23) 

Let us fix g; since U(x\f) has increasing differences in x and /, U(x\f) — U(x\f) is a nondecreasing 
function of x. Since F(-\x,a, g) ^sd IP(- a, 5) by Definition 6, it follows that: 

T(x,a,f,g)-f(x,a,f,g)>f(x,a,f,g)-f(x,a,f,g). (24) 
Also by Definition 6, F(-\x,a, g) ^sd P("|£) a ><?) which implies that 

T(x,a,f,g)-f(x,a,f,g) >f(x,aj,g)-f(x,aj,g). (25) 
Using equations (24) and (25) and rearranging the terms, we get that 

T(x, a, f, g) - f{x, a, /, g) > f(x, a, f, g) - f{x, a, /, g). (26) 

Now let g ^sd g and note that ¥(-\x,a,g) has increasing differences in (x,a) and g by Definition 6. 
Also, note that U(x\f) is a bounded nondecreasing function of x. This implies that T(x, a, /, g) has 
increasing differences in (x, a) and g. That is, 

f(x,a,f,g)-T(x,a,f,g) >T(x,a,f,g)-T(x,a,f,g). (27) 

From equations (26) and (27), we get that for any x > x, a> a, f ^ S d f, and g ^sd <? we have 

f(x,a,f,g)-f(x,a,f,g) >f(x,a,f,g)-f(x,a,f,g). 

Taking f = g and / = g in the above equation shows that equation (23) is true which proves the 
lemma. □ 



43 



Lemma 4. Suppose that U(x\f) is a nondecreasing bounded function in x that has increasing dif- 
ferences in x and f . Define 

U*(x\f)= sup L(x,a,f) + (3 [ U(x'\f)W>(dx'\x,a,f)). 

a£A(x) I J X ) 

Then U*(x\f) is nondecreasing in x and has increasing differences in x and f. 
Proof. Define 

W(x,a,f)=x(x,a,f) + p [ U(x'\f)F (dx'\x,a, f) 

J x 

= ir(x,a,f)+/3T(x,a,f). 

From Lemma 3, we know that T(x,a,f) is nondecreasing in x and a, supermodular in (x,a) and 
has increasing differences in (x, a) and /. From Definition 6, we get that W(x, a, f) is nondecreasing 
in x, supermodular in (x,a) and has increasing differences in (x,a) and /. 

We now show that U*(x\f) is nondecreasing in x for fixed /. Fix x' > x, and choose a € A(x) 
such that W(x, a, f) = sup^^) W(x, a, /); such an action exists since tt and P are continuous in 
a and A(x) is compact. Similarly, fix a' such that 7r(x,a',f) = snp a£A ^ x ,- ) 7r(x,a, f). 

We consider two cases. If a € A(x'), then: 

U*(x\f)=n(x,a,f) + 0T(x,a,f) 
<ir(x',a,f)+f3T(x',aJ) 

< sup Tr(x',a,f) + PT(x',a,f) = U*(x'\f), 

where the first inequality follows since tt and T are both nondecreasing in x. 

On the other hand, if a A(x'), then it follows that a' > a, since A is a nondecreasing corre- 
spondence. (Note that this step uses the fact that the action set is totally ordered.) Thus: 

U*(x\f)=n(x,a,f) + 0T(x,a,f) 
<7r(x',a',f) + (3T(x,a,f) 
<ir(x',a'J) + (3T(x',a',f) 

< sup ir(x',a,f)+f3T(x\a,f) = U*(x'\f), 

Here the first inequality follows since sup aeA ^n(x,a, f) is nondecreasing in x, and by our choice 
of a'; and the second inequality follows since T is nondecreasing in x and a. We conclude that 
U*{x\f) is nondecreasing in x for fixed /. 
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To prove that U*(x\f) has increasing differences, we reason using an argument similar to Lemma 
A.l of Hopenhayn and Prescott (1992). Let x 2 > X\ and f 2 ^sd /i- To prove that U*(x\f) has 
increasing differences, we need to show that 

U*(x 2 \f 2 ) - U*( Xl \f 2 ) > U*{x 2 \h) - U*{x x \h). (28) 

To economize on notation, given two actions a, a', we let a V a' = sup{a,a'}, and let a A a 1 = 
inf{a,a'}. Fix a x £ A{xi) and a 2 G ^4(x 2 ). Since A{x) is a nondecreasing correspondence, we have 
cii V a 2 6 ^4(x 2 ) and ai A a 2 G ^.(xi). Thus we have: 

£T(zi|/i) + I7*(ar 3 |/ a ) > W( Xl , ai A a 2 , A) + W(x 2 , a, V a 2 , f 2 ) 

= W(x u a 1 Aa 2 J 1 )+W(x 2 ,a 1 Va 2 ,f 2 ) + W(x 1 ,a 1 Aa 2 ,f 2 ) 

-W{x u ax Aa 2 ,/ 2 ) 
= W(xi, a x A a 2 , /i) + W(xi V x 2 , ai V a 2 , f 2 ) + W(zi A x 2 , a x A a 2 J 2 ) 
-W(x 1 ,a 1 Aa 2 ,f 2 ) 

The last equality follows from that fact that x x V x 2 = x 2 and X\ Ax 2 = X\. Since W(x,a, /) is 
supermodular in (x, a) we get that 

U*( Xl \h) + U*(x 2 \f 2 ) > W(x uai A a 2 ,h) + W(x 2 ,a 2 J 2 ) + W( Xl ,a u f 2 ) 

-W(x 1 ,aiAa 2 ,f 2 ), 
= W{x u atJ 2 ) + W(x 2 ,a 2 ,f 1 ) - W(x 2 ,a 2 ,f 1 ) + W(x 2 ,a 2 ,f 2 ) 

+ W(x 1 ,a 1 A a 2 , /i) - W(x!, ai A a 2 , / 2 ) 
>^(x 1 ,a 1 ,/ 2 ) + ^(x 2 ,a 2 ,/ 1 )- 

Here the last inequality follows from the fact that W(x,a,f) has increasing differences in (x,o) 
and /. Taking the supremum over ai and a 2 in the above inequality we get 

U*{ Xl \h) + U*(x 2 \f 2 ) > U*( Xl \f 2 ) + U*{x 2 \h) 

which implies equation (28), and thus U*(x\f) has increasing differences in x and /. This proves 
the lemma. □ 

LEMMA 5. V*(x\f) is nondecreasing in x and has increasing differences in x and f. 
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Proof. Let V a (x\f) = for all x, and let: 

V k+1 (x\f)= sup \n(x,a,f) + p I V k (x'\f)F(dx'\x,a, f) ) ; (29) 

this is value iteration. By the preceding lemma, every V k is nondecreasing in x and has increasing 
differences in x and /. Under our assumptions, value iteration converges starting from the zero 
function (Bertsekas 2001), i.e., for all x, V k (x\f) — > V*(x\f) as k — > oo. Since monotonicity and 
increasing differences are preserved upon taking limits, we conclude V*(x\f) is nondecreasing in x 
and has increasing differences in x and /. □ 
We now apply Topkis' Theorem in the next lemma to conclude the set of optimal strategies is 
monotone. 

Lemma 6. For each x and f, define the set £l(x,f) as: 

n(x,/) = axg max \ir(x, a, /) + /3 f V*(x'\f)F(dx'\x, a, /)) . (30) 

aeA(x) 1 J X J 

Then is nondecreasing in (x,f). 
Further: 

P(f){x) = supfl{x,f); p(f)(x) = 'm£ 

where p and p are the strategies defined in (9). Bothp(f) are p(f) are nondecreasing in f, and for 
fixed f both strategies are also nondecreasing in x. 

Proof. Observe that 7r(x,o, /) is supermodular in (x,a) and has increasing differences in (x,a) 
and / (by Definition 6); and T(x,a,f) is supermodular in (x,a) and has increasing differences in 
(x,a) and /, where T is defined in equation (22), with U = V* (by Lemma 3 and 5). Further, 
A(x) is an increasing correspondence. By Topkis' Theorem (Theorem 2.8.1 in Topkis 1998), we 
conclude that Q(x, f) is nondecreasing in (x, /) wherever it is nonempty. By Lemma 2, however, the 
maximum on the right hand side in (30) is always achieved, so Q(x, f) is nondecreasing everywhere. 

To conclude the proof, observe that by Lemma 2, p(f) must be the strategy that takes the largest 
action in f2(x, /) for each x, and p(f) must be the strategy that takes the smallest action in Q,(x, f) 
for each x. Monotonicity of p and p follows from the monotonicity properties of 0. □ 

We now turn our attention to T>. Given any strategy fi and population state /, define a map 
Qnj '■ $ — > $ according to the kernel induced by \x and /, i.e., for all Borel sets S: 

Q,j(g)(S)= [ F(S\x,ti(x),f)g(dx). 
J x 

(This is equation (14).) 
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Lemma 7. Suppose f ^ S d /, g' hsD 9, and A 4 ' !> A 4 , and both // and /i are nondecreasing, then 

Q»>,r(g') tsD Qnj(g)- 

Proof. Let </> be a bounded nondecreasing real- valued function on X . We need to show that for 
every x € X, we have 

/ [ <f>(y)F(dy\x,v'(x),f')g'(dx)> [ f </>(y)F(dy\x,n(x),f)g{dz). (31) 

JXJX JXJX 

Let us define 

H(x;n,f)= [ </>(y)V(dy\x,n(x)J). 
Jx 

Observe that: 

/ / </>(y)F(dy\x,n(x),f)g(dx) = [ H(x; M , /) g(dx). 

JX JX JX 

Let x' > x and note that fx is a nondecreasing function of x. From Definition 6, we know 
that F(-\x,a, f) is stochastically nondecreasing in (x,a), which implies that P(-|x', n(x'), f) ^sd 
P(-|x, n{x), /). Since 4> is a nondecreasing function, we get that H(x';n,f) > H(x;fi,f). 

From Definition 6, we know that ¥(-\x,a, f) is nondecreasing in a and /. Thus, for any fixed x, 
we have 

V(-\x,iM'(x)J')hsDV(-\x,n(x),f). 

This along with the fact that <p is a nondecreasing function implies that H(x; fi' , f) > H(x;n,f) 
for every fixed x G A\ 

We now reason as follows: 

/ H(x;v',f')9'(dx)> [ H(x; f iJ)g'(dx) 
Jx Jx 

> / H{x]^J)g(dx). 
Jx 

Here the first inequality follows from the fact that H(x; fj,', /') > H(x; [M, f) and the second inequality 
follows from that fact that H(x'\ fj,, /) > H(x;n,f) for x' > x, and that 5' ^sd 9- This proves 
equation (31) and hence proves the lemma. □ 

Lemma 8. Fix [i € 9JTo and / 6 5, and suppose /i is nondecreasing in x. Then D(fi,f) is a 
nonempty complete lattice. Further, d(/j,,f) and d(fi,f) (as defined in (10)) exist and are both 
invariant distributions of the Markov process induced by /i and f (cf. (7)). 

Finally, if f / and /i' > /i and both \i and [i! are nondecreasing, then d([i',f) ^sd d(/i, f) 
and d{n',f) hso d(fij). 
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Proof. By the preceding lemma, Q^j{g) is nondecreasing in g. By Tarski's theorem, the set of 
fixed points of Q^j is a nonempty complete lattice. View the set of nondecreasing strategies \i as 
a partially ordered set 9Jto, with the coordinate- wise ordering [>. Then Q^j{-) is a nondecreasing 
function on 9Jt x^xj, by the preceding lemma. Theorem 2.5.2 in Topkis (1998) generalizes 
Tarski's theorem to fixed points of a nondecreasing function parametrized by a partially ordered 
set (9Jto x 30; one consequence of this generalization is that the largest and smallest fixed points are 
nondecreasing in the parameter. This generalization directly implies that both d(fj,, f) and d(/j,, f) 
are nondecreasing in u and /. □ 

In the next lemma we establish existence of fixed points of thus proving Theorem 2. 

LEMMA 9. Let $(/) and $(/) be defined as in (11). Then !k(f),$(f) 6 <&(/)■ Further, both are 
nondecreasing in f , and thus the sets of their fixed points are each nonempty complete lattices. 

Thus there exists a mean field equilibrium for the stochastic game with complementarities V: in 
particular, if f is a fixed point of Q (resp., Q), then {p(f),f) (resp., (p(f),f)) is a mean field 
equilibrium. 

Proof. That <&(/) is nonempty follows by Lemmas 2 and 8. Observe that if /' f, then 
p(f') ^ P{f) by Lemma 6. Further, p(f') and p(f) are both nondecreasing in x as well, so by 
Lemma 8, d(p(f'),f) hso d(p(f), f), establishing that $ is monotone. That $(/) 6 <&(/) follows 
from the definition. The proof for $(/) is identical. The conclusion regarding fixed points follows 
from Tarski's theorem. □ 

Appendix B: Proofs: Section 3.2 

Proof of Corollary 1. Since p(f) = supV(f), and fx € V(f), we must have p(f) > fJ,. Similarly 
A* ^ £>(/)• Now since p(f) and p(f) are both nondecreasing strategies, by Lemma 8 we conclude that 
<&(/) = d(p(f), f) ^sd d(fi, f) ^sd /> where the last inequality follows because d(fi, f) = sup £>(//, /), 
and / G T>(u 1 /). A similar argument shows that / tsD JK/)- The result now follows from Theorem 
2.5.1 in Topkis (1998), which is a sharper version of Tarski's fixed point theorem; in particular, that 
statement shows that / = sup{/' : <£(/) ^sd f}, and / = inf {/' : / ^sd 5&(/)}- Since / is contained 
in both the former and the latter sets, we conclude / ^sd / ^sd /• The result regarding strategies 
now follows from monotonicity of p and p, and the fact that p(f) < np{f)- O 



Appendix C: Proofs: Section 4 

We start with two essential lemmas. 
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Lemma 10. Suppose that f ^sd fi i^sd fi • ■ • , and // <! Mi — M2 • ■ • ■ ITien there exists a distribution 
f* and a strategy fi* such that f t converges weakly to f* as t — > 00, and \x t converges pointwise to 
fi* as t 00. 

Proof. Note that since the set 9Jto is compact and /i t is a nondecreasing sequence of policies, 
there must exist a pointwise limit //*. Next, consider the distribution functions F t corresponding 
to the measures f t . By Prohorov's theorem, there exists a measure /* S $ and a subsequence i& 
such that F t (x) — > F*(x) at all points of continuity of F*, where F* is the distribution function 
corresponding to /*. Since f t is a monotone sequence, we have F t {x) > F t+ i{x) for all x, so in fact 
F t (x) — > F*(x) at all points of continuity of F*. Thus f t converges weakly to /*, as required. □ 

Lemma 11. Suppose Assumption 1 holds. Then p(f) andp(f) (cf. (9)) are both continuous in f, 
and Qf!, g {f) (cf. (14) j is continuous in fj,, g, and f, where we endow 5 with the topology of weak 
convergence, and £DT with the topology of pointwise convergence. 

Proof. Under Assumption 1, the first result follows using Theorem 1 of Dutta et al. (1994), 
which establishes upper semicontinuity of Q(x,f) in / (where Q is defined as in (30)). From this 
it follows that p(f) and p(f) are continuous in /. 

Next we show that Q M (f) as defined in (14) is continuous in fj,, g and /, where we endow J with 
the topology of weak convergence, and 9H with the topology of pointwise convergence. Suppose 
that g n — > g (weakly) ,/„—)•/ (weakly) , and that fi n — > fi (pointwise) . Fix a bounded function <fi on 
X, and define 

H(x;fj,,g)= / cj>(y)P(dy\x,(j,(x),g). 
J x 

For every x, H(x;fi n ,g n ) — > H(x;/j,,g) by continuity of P(-\x,a,g) in a and g. Thus by Theorem 5.5 
of Billingsley (1968), we have: 

/ H(x;n n ,g n )f n (dx)-> / H(x; fi,g)f(dx). 
J x J x 

The left hand side is the expected value of <f> under Q^ n , Sn (/«), and the right hand side is the 

expected value of <f> under Q^ g (f), so (weak) continuity of Q Al , 9 (/) is proved. □ 

Proof of Proposition 1. Since f t+1 = $(/*), and is monotone by Lemma 9, it follows that 

fo ^sd fi ^sd /a ■ • • • Since \x t =p(ft), and p is monotone in f t by Lemma 6, it follows that fi <fii < 

fj,2 • • • ■ Finally, since p{f){x) is nondecreasing in x for every /, it follows that \i t is nondecreasing. 

By Lemma 10, there exists a limit (//*,/*). Since every fj, t is nondecreasing in x, the limit fi* must 

be nondecreasing in x as well. 



49 



We now show that if Assumption 1 holds, then the limit point (//*,/*) is the smallest mean field 
equilibrium. By Lemma 11, both p(f) and Q Mig (/) are continuous. This implies that fa = p{f t ) — > 
p(f*) as t ->■ oo, so n* =£(/*). Further, since f t+1 = d(fa,f t ), it follows that Q w ,/ t (/t+i) = /t+i- 
Taking limits on the left and right, we have Qn*j*(f*) = /*, i-e., /* € /*). Thus we conclude 

i s a mean field equilibrium. 

Let / be the smallest fixed point of $(/), as defined in (12). Observe that at time 0, f ^ SD /, 
since / is the smallest distribution in the lattice 5- Since $ is monotone, f t f for all t. Since f t 
converges weakly to /*, we conclude /* ^sd /■ On the other hand, observe that fi* is nondecreasing, 
so by Corollary 1, we have / ^sd /* — i-e-, /* = /, as required. □ 

Proof of Proposition 2. We proceed by induction. First note that by Lemma 6, p{f){x) is a 
nondecreasing strategy in x for each /, so every fa is nondecreasing. We start by observing that 
f is the smallest distribution in 5 in the ^sd ordering, so f\ >^sd fo trivially. Since p is monotone 
in / by Lemma 6 we have fa =p(fo) = fa- 

So now suppose that f Q ^ S d fi ^sd ■ ■ ■ ^sd ft, and fa<fa ■ '^fa- Define Q llt j t according to (14). 
Then by Lemma 7, Q Mt ,/ t is nondecreasing; since f t +\ = QntJtift), we conclude f t+1 ^sd ft- The 
same argument as the preceding paragraph then yields fa+\ !> fa, as required. Applying Lemma 10 
yields the convergence result; note that fi* must be nondecreasing, since every fa is nondecreasing. 

From Lemma 11, if Assumption 1 holds, we conclude that fi* = p{f*), and Q jU * i /*(/*) = /* - 
i.e., /* G /*). Thus (fa",f*) is a mean field equilibrium. 

Let / be the smallest fixed point of $(/), as defined in (12). Observe that at time 0, / ^sd /, 
since f is the smallest distribution in the lattice 5- Thus fa =p(fo) ^p(f), so fi = <3 MO ,/ (/o) ^sd 
Qp(f)j{f) = /, where the last equality follows since / must be an invariant distribution associated 
with p{f) and /. Proceeding inductively, we have f t ^sd / for all t. Since f t converges weakly to 
/*, we conclude /* ^sd /• On the other hand, observe that fi* is nondecreasing, so by Corollary 1, 
we have / ^sd /* — i-e., /* = /, as required. □ 

Appendix D: Proof: Section 5 

Proof Sketch for Theorem 3. Let V(f', 9) denote the set of optimal oblivious strategies for an 
agent given population state / in the game T(0); and let V(faf;9) denote the set of invariant 
distributions induced by the strategy fi and population state / in the game T(9). Let p(f;6) and 
p{f;9) be defined as in (9) using V(f;9); and similarly, let d(faf;9) and d(fi,f;9) be defined as 
in (10) using T>{fa f;9). Using exactly the same reasoning as in the proof of Theorem 2, under 
Assumption 2, it follows that both p{f;9) and p(f;9) are nondecreasing in / and 9, and nonde- 
creasing in state; and further, that d(fi,f;9) and d{fi,f\9) are nondecreasing in fa f, and 9, when 
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restricted to strategies \x that are nondecreasing in the state. Letting <j)(f;9) = d(p(f;9), f;9) and 
cj)(f]8) = d(p(f;9),f;9), it follows that (p and <f> are nondecreasing in / and 9. By Theorem 2.5.2 
in Topkis (1998), the largest and smallest fixed points of 4>(f;9) and (p)(f;9) are nondecreasing 
in 9. The result follows. □ 

Appendix E: Proofs: Section 6 

Proof Sketch for Theorem 4- Analogous to the proof of Theorem 2, we define p{a) = inf V(a), 
and p(a) = sup"P(a) (with the inf and sup taken coordinatewise); and d(fi,a) = sup£>(/i, a), and 
d(fj,,a) = inf "D(fi, a). Lemmas 2, 3, 4, 5, and 6 remain identical, except that the population state 
/ is replaced by the population action distribution a. Next, we define Q^, a {g) as: 

Q,Ad)(S)= [ P(S\x,n(x),a)g(dx). 
J x 

An identical arguments to the proof of Lemma 7 then shows that if a' ^sd ot, g' ^sd g, and // > /i, 
and both \j! and /i are nondecreasing, then Q^'j'ig') ^sd Q/x,/(<?)- (Here a' and a are population 
action distributions; g' and g are population states; and is! and \i are oblivious strategies.) It follows 
that Lemma 8 remains identical as well, with the population state / replaced by a. 

Finally, we turn our attention to T>. Suppose that // > \± (where \i and /i' are both measurable 
oblivious strategies), and /' ^sd / (where / and /' are both population states). Suppose <f> : A — > R 
is nondecreasing. Let a = V(fi,f), and let a' = £>(//,/'). Then: 

/ <P(a)a'(da)= [ </>(//(*)) f'(dx) 

J A J X 

> [ <f>fa(x)) f(dx) 

J X 

> [ <f>( f i(x))f(dx)= [ 4>{a)a{da). 

J X J A 

The first inequality follows because n'(x) > n(x) for all x, and <ft is nondecreasing. The second 
inequality follows because /u(x) is nondecreasing, so 4>{[i{x)) is nondecreasing in x; and /' ^sd /• It 
follows from this argument that £>(//, /') ^sd ^(a^/); establishing that T> is monotone as well. 

So now we define two functions <I> : # A — > $ A and $ : 3v\ — > $a, analogous to the definitions in 
(11). Let <£(a) = V(p(a),d(p(a),a))-, and let $(a) = f>(p(a), d(p(a), a)). Then both $ and $ are 
monotone maps on the nonempty complete lattice s0 the set of fixed points of both maps are 
nonempty complete lattices by Tarski's fixed point theorem. In particular, any one of these fixed 
points is a mean field equilibrium of the given action-coupled stochastic game. □ 
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Appendix F: Proofs: Section 7 

Proof of Lemma 1. Since h is concave in a for fixed x (Definition 9), it is also continuous in a 
for fixed x; since A is a compact interval, it follows by the intermediate value theorem that H (x) is 
a compact interval. Now if x' > x, then h(x',a) > h(x,a) for all a (by Definition 9), so we conclude 
H(x) is nondecreasing in x. 

Since h(x,a) is nondecreasing in both x and a, and c(a) is nondecreasing in a, it follows that 
C(x, h) is nondecreasing in h when x is fixed, and nonincreasing in x when h is fixed. Equation (20) 
also follows since c(a) is nondecreasing in a. Convexity in h follows by standard results in convex 
optimization: since we restrict attention to h S H(x) and h and c are both nondecreasing in a, we 
can rewrite the constraint h(x,a) = h as h(x,a) > h in the definition of C(x,h), i.e., for h 6 if(:r) 
we have: 

C(x, h) = inf _ c(a). 

a£.4:h(:r,a)>/i 

Now since C(x, h) is defined via minimization of a convex objective function over a convex feasible 
region parametrized by h, it is convex in h (Bertsekas 2009). 

Finally, we establish the claim of decreasing differences. Fix x,x',h, and bl as in the statement of 
the lemma. Define a-i, a 2 , a 3 , and a 4 as optimizing values of a in the definition of C(x, h), C(x', h), 
C(x, h'), and C(x', h'), respectively. We have h(x, ai) = h(x', a 2 ) = h, and h(x, a 3 ) = h(x', a 4 ) = h! . 

Observe that since hi > h and h is nondecreasing in action, a 4 > a 2 , and a 3 > a±. Further, since 
h is nondecreasing in x, we have a 4 < a 3 and a 2 < «i- Let (5 = a 4 — a 2 - Define g(a) = —h(x, —a) 
for a € — .4; then observe that g is a convex, nondecreasing function on —A. By Lemma 12 (see 
below), we have: 

g(-a 2 ) - g{-a 4 ) > g{-ai) - g{-a t - 8). 

In terms of h, this implies: 

h(x, a 4 ) — h(x, a 2 ) > h(x, cti + 8) — h(x, «i). (32) 
We can now show that a 4 — a 2 < a 3 — a±. We have: 

hi — h = h(x' ', a 4 ) — /i(x', a 2 ) 

> /i(x, a 4 ) — /i(x, a 2 ) 

> h(x, oii + S) — h(x, ati). 

Here the first inequality follows by supermodularity of h in (x,a) (Definition 9), and the second 
inequality follows by (32). Since h(x,ati) = h and h(x,a 3 ) = h', and h is nondecreasing in action, 
it follows that «i + 8 < a 3 , i.e., a 3 — ol\ > a 4 — a 2 . 
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The result now follows by another application of Lemma 12 (see below), which implies: 

c(a 3 ) - c(a 1 ) > c(a 4 ) - c(a 2 ), 

or equivalently, 

C(x, h!) - C(x, h) > C(x', h') - C(x', h), 

as required. □ 

LEMMA 12. Let S C R be convex, and suppose g : S — > 1R is a nondecreasing convex function. Fix 
x, x' , y, y' € S, such that y>x, y' > y, x' > x, and y' — y > x' — x. Then: 

g(y') - g(y) > g(x') - g(x). 

Proof. Define y' = y + (x' — x). Clearly y < y' < y\ so y' £ S. Observe that x < y implies x' < y', 
so we can choose a, 5 € (0,1) such that: 

ax + (1 — a)y' = x'; 
5x + (l-5)y' = y. 

In particular, it follows that a = (y' — x') / (y' — x) , and 8 = (y' — y)/ (y' — x) . Since y' — x' = y — x, 
we conclude a + 5 = 1. Applying convexity we have: 

g(x') < ag(x) + (1 - a)g{y'); 
g(y)<Sg(x) + (l-S)g(y') 

Adding these together, and using the fact that a + 8 = 1, we conclude g(x') + g(y) < g(x) + g(y'), 
or: 

g{x')-g{x)<g(y')-g{y). 

Finally, since g is nondecreasing, g(y') <g(y'), and the result follows. □ 
Proof of Proposition 3. We simply check the conditions outlined in Definition 6. Observe that 
v(x,f) has is nondecreasing in x and has increasing differences in x and / by assumption, and 
further, sup^g^ \v(x,f) \ < oo. In addition C(x,h) is convex in h and nonincreasing in x, and has 
decreasing differences in x and h by Lemma 1. It follows that ir(x,h,f) is nondecreasing in x; 
continuous in h; supermodular in (x, h) (the latter as it is separable in x and h); and has increasing 
differences in (x,h) and /. Furthermore, for fixed h and /, sup xeX tt(x, h, f) < oo. Thus the first 
two conditions of Definition 6 are satisfied. 
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Next, we consider the transition kernel. Here the desired properties follow by assumption: 
P(-|x, h, f) is trivially stochastically supermodular in (x, h), since it does not depend on x and h is 
scalar. By assumption the kernel is stochastically nondecreasing in h and /, continuous in h, and 
has increasing differences in h and /. 

Finally, note H(x) is a compact interval, H is nondecreasing, and H{x) C [h, h] for all x, which 
is also compact. Further, observe that for all x and /: 

sup 7t(x,h, f) = v(x, f) — mi C(x,h) = v(x, f) — c(a), 

h£H(x) h£H(x) 

where the last step follows by Lemma 1. Since v is nondecreasing in x, it follows that 
su PheH(x) 77 ( x > h-> f) i s nondecreasing in x. It follows that T is a stochastic game with complemen- 
tarities, as required. □ 

Proof of Theorem 5. Let T be the equivalent stochastic game with complementarities con- 
structed in Proposition 3. Suppose that (p,, f) is a mean field equilibrium of T; note that in this 
case jX is a strategy where fi{x) € H{x) for all x £ X. 

Define a new strategy fi : X — > A as follows. For each x, let /j,(x) be an action such that 
h(x,fi(x)) = £b(x). That is, we choose the action fi(x) to yield exactly the kernel parameter £l(x). 
Then observe that ir(x,[j,(x),g) = fr(x,(l(x),g), for all x and g. Since jl is an optimal oblivious 
strategy for a player given population state / in T, by construction of T the strategy \i maxi- 
mizes the expected discounted payoff to a player given / in the original game T. Further, since 
P(-|x, /i(x),g) = P(-\x,fi(x),g), it follows that / is an invariant distribution of the strategy /U. Thus 
(/x, /) is a mean field equilibrium of the game T, as required. □ 

Lemma 13. Suppose that T is a stochastic game satisfying all the conditions in Definition 9, except 
that v is not necessarily nondecreasing in x. Suppose in addition that ¥(-\h,f) does not depend on 
f. Then f V*(x'\f)W(dx'\h) has increasing differences in h and f. 

Proof. Let U(x\f) be any function that has increasing differences in x and /. Define: 

T(h,f)= [ U(x'\f)F(dx'\h). 
J x 

Fix h' >h and /' ^sd /• Then since U(x\f) has increasing differences in x and /, U(x\f) — U(x\f) 
is a nondecreasing function of x. Since F(-\h ) ^sd by Definition 9, it follows that: 

/ (U(x\f')-U(x\f))F(dx\h')> [ (U(x\f')-U(x\f))P(dx\h). 
Jx J X 

This is exactly the relationship that T(h',f) — T(h',f) > T(h,f) — T(h,f), so T has increasing 

differences in h and /. 
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The remainder of the proof follows Lemma 4, and 5. First, the same approach as the proof of 
Lemma 4 can be used to show that U*(x\f) has increasing differences in x and /, where: 

U*(x\f)= sup \v(x,f)-C(x,h)+p [ U(x'\f)p(dx'\x,h)\ . 

h£H(x) I JX ^ ' J 

Finally, value iteration yields that V*(x\f) has increasing differences in x and /, as is shown in 
Lemma 5. □ 
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